What is the difference between the fundamental, essential and homography matrices? - opencv

I have two images that are taken from different positions. The 2nd camera is located to the right, up and backward with respect to 1st camera.
So I think there is a perspective transformation between the two views and not just an affine transform since cameras are at relatively different depths. Am I right?
I have a few corresponding points between the two images. I think of using these corresponding points to determine the transformation of each pixel from the 1st to the 2nd image.
I am confused by the functions findFundamentalMat and findHomography. Both return a 3x3 matrix. What is the difference between the two?
Is there any condition required/prerequisite to use them (when to use them)?
Which one to use to transform points from 1st image to 2nd image? In the 3x3 matrices, which the functions return, do they include the rotation and translation between the two image frames?
From Wikipedia, I read that the fundamental matrix is a relation between corresponding image points. In an SO answer here, it is said the essential matrix E is required to get corresponding points. But I do not have the internal camera matrix to calculate E. I just have the two images.
How should I proceed to determine the corresponding point?

Without any extra assumption on the world scene geometry, you cannot affirm that there is a projective transformation between the two views. This is only true if the scene is planar. A good reference on that topic is the book Multiple View Geometry in Computer Vision by Hartley and Zisserman.
If the world scene is not planar, you should definitely not use the findHomography function. You can use the findFundamentalMat function, which will provide you an estimation of the fundamental matrix F. This matrix describes the epipolar geometry between the two views. You may use F to rectify your images in order to apply stereo algorithms to determine a dense correspondence map.
I assume you are using the expression "perspective transformation" to mean "projective transformation". To the best of my knowledge, a perspective transformation is a world to image mapping, not an image to image mapping.

The Fundamental matrix has the relation
x'Fu = 0
with x in one image and u in the other iff x and u are projections of the same 3d point.
Also
l = Fu
defines a line (lx' = 0) where the correponding point of u must be on, so it can be used to confine the searchspace for the correspondences.
A Homography maps a point on one projection of a plane to another projection of the plane.
x = Hu

There are only two cases where the transformation between two views is a projective transformation (ie a homography): either the scene is planar or the two views were generated by a camera rotating around its center.

Related

Multiple camera view triangulation

Given a scene, multiple camera views of that scene and their corresponding projection matrices, we wish to triangulate 2D matching points into 3D.
What i've been doing so far is solve the system PX = alphax where P is the projection matrix, X is the 3D point in camera coordinates, alpha is a scalar and x is the vector corresponding to the point in 2D. X and x are in homogeneous coords.
See https://tspace.library.utoronto.ca/handle/1807/10437 page 102 for more detail.
Solving this with an SVD yields proper results when the 2D points are accurately selected or when i only use two views. Introducing more views adds a lot of error.
Any advice on what techniques are best to improve/refine this solution and make it support more views?
If I understand correctly, we can view this as finding a point in 3D space that minimizes the sum of orthogonal distances between the point and lines (one line per camera view) ? I guess with a gradient descent in 3d space, it's possible to find a local minimum of this.
Did I understand the problem correctly?

Homography and Affine Transformation

Hi i am a beginner in computer vision and i wish to know what exactly is the difference between a homography and affine tranformation, if you want to find the translation between two images which one would you use and why?. From papers and definitions I found online, I am yet to find the difference between them and where one is used instead of the other.
Thanks for your help.
A picture is worth a thousand words:
I have set it down in the terms of a layman.
Homography
A homography, is a matrix that maps a given set of points in one image to the corresponding set of points in another image.
The homography is a 3x3 matrix that maps each point of the first image to the corresponding point of the second image. See below where H is the homography matrix being computed for point x1, y1 and x2, y2
Consider the points of the images present below:
In the case above, there are 4 homography matrices generated.
Where is it used?
You may want to align the above depicted images. You can do so by using the homography.
Here the second image is mapped with respect to the first
Another application is Panoramic Stitching
Visit THIS BLOG for more
Affine transformation
An affine transform generates a matrix to transform the image with respect to the entire image. It does not consider certain points as in the case of homography.
Hence in affine transformation the parallelism of lines is always preserved (as mentioned by EdChum ).
Where is it used?
It is used in areas where you want to alter the entire image:
Rotation (self understood)
Translation (shifting the entire image by a certain length either to top/bottom or left/right)
Scaling (it is basically shrinking or blowing up an image)
See THIS PAGE for more

Find projection matrix

at first I want to apologize for my bad English.
I am really new in OpenCV and in virtual reality. I tried to find out the theory of image processing, but some points are missing there for me. I learned that projection matrix is matrix to transform 3D point to 2D. Am I right? Essential matrix gives me information about rotation between two cameras and fundamental matrix gives information about the relationship between pixel in one image with pixel in other image. The homography matrix relates coordinates of pixel in two image (is that correct?).
What is the difference between fundamental and homography matrix?
Do I need all these matrices to get projection matrix?
I am new in these, so please if you can, try to explain me it simply.
Thanks for your help.
I learned that projection matrix is matrix to transform 3D point to 2D. Am I right?
Yes. But usually these transformations are expressed in homogeneous coordinates. This means that 3D points are represented by 4-vectors (ie vectors of length 4), and 2D points are represented by 3-vectors.
The homography matrix relates coordinates of pixel in two image (is that correct?)
No. This is true in two special cases only: when the scene lie on a plane, or when the two views have been generated by two cameras sharing the same center location.
In all the other cases, ie when the scene is not planar and the two cameras have different centers, there is not an homography transforming one image into the other.
What is the difference between fundamental and homography matrix?
There are many differences. From an algebraic point of view, the most obvious difference is that an homography matrix is non-singular (its rank is 3), while a fundamental matrix is singular (its rank is 2).

Project 2d points in camera 1 image to camera 2 image after a stereo calibration

I am doing stereo calibration of two cameras (let's name them L and R) with opencv. I use 20 pairs of checkerboard images and compute the transformation of R with respect to L. What I want to do is use a new pair of images, compute the 2d checkerboard corners in image L, transform those points according to my calibration and draw the corresponding transformed points on image R with the hope that they will match the corners of the checkerboard in that image.
I tried the naive way of transforming the 2d points from [x,y] to [x,y,1], multiply by the 3x3 rotation matrix, add the rotation vector and then divide by z, but the result is wrong, so I guess it's not that simple (?)
Edit (to clarify some things):
The reason I want to do this is basically because I want to validate the stereo calibration on a new pair of images. So, I don't actually want to get a new 2d transformation between the two images, I want to check if the 3d transformation I have found is correct.
This is my setup:
I have the rotation and translation relating the two cameras (E), but I don't have rotations and translations of the object in relation to each camera (E_R, E_L).
Ideally what I would like to do:
Choose the 2d corners in image from camera L (in pixels e.g. [100,200] etc).
Do some kind of transformation on the 2d points based on matrix E that I have found.
Get the corresponding 2d points in image from camera R, draw them, and hopefully they match the actual corners!
The more I think about it though, the more I am convinced that this is wrong/can't be done.
What I am probably trying now:
Using the intrinsic parameters of the cameras (let's say I_R and I_L), solve 2 least squares systems to find E_R and E_L
Choose 2d corners in image from camera L.
Project those corners to their corresponding 3d points (3d_points_L).
Do: 3d_points_R = (E_L).inverse * E * E_R * 3d_points_L
Get the 2d_points_R from 3d_points_R and draw them.
I will update when I have something new
It is actually easy to do that but what you're making several mistakes. Remember after stereo calibration R and L relate the position and orientation of the second camera to the first camera in the first camera's 3D coordinate system. And also remember to find the 3D position of a point by a pair of cameras you need to triangulate the position. By setting the z component to 1 you're making two mistakes. First, most likely you have used the common OpenCV stereo calibration code and have given the distance between the corners of the checker board in cm. Hence, z=1 means 1 cm away from the center of camera, that's super close to the camera. Second, by setting the same z for all the points you are saying the checker board is perpendicular to the principal axis (aka optical axis, or principal ray), while most likely in your image that's not the case. So you're transforming some virtual 3D points first to the second camera's coordinate system and then projecting them onto the image plane.
If you want to transform just planar points then you can find the homography between the two cameras (OpenCV has the function) and use that.

Compute transformation matrix from a set of coordinates (with OpenCV)

I have a small cube with n (you can assume that n = 4) distinguished points on its surface. These points are numbered (1-n) and form a coordinate space, where point #1 is the origin.
Now I'm using a tracking camera to get the coordinates of those points, relative to the camera's coordinate space. That means that I now have n vectors p_i pointing from the origin of the camera to the cube's surface.
With that information, I'm trying to compute the affine transformation matrix (rotation + translation) that represents the transformation between those two coordinate spaces. The translation part is fairly trivial, but I'm struggling with the computation of the rotation matrix.
Is there any build-in functionality in OpenCV that might help me solve this problem?
Sounds like cvGetPerspectiveTransform is what you're looking for; cvFindHomograpy might also be helpful.
solvePnP should give you the rotation matrix and the translation vector. Try it with CV_EPNP or CV_ITERATIVE.
Edit: Or perhaps you're looking for RQ decomposition.
Look at the Stereo Camera tutorial for OpenCV. OpenCV uses a planar chessboard for all the computation and sets its Z-dimension to 0 to build its list of 3D points. You already have 3D points so change the code in the tutorial to reflect your list of 3D points. Then you can compute the transformation.

Resources