Finding Rotation matrices between two cameras for "Stereorectify" - opencv

So I have a depth map and the extrinsics and intrinsics of the camera.I want to get back the 3D points and the surface normals .I am using the functionReprojectImageTo3D.In the stereo rectify function to find Q how do I get the The rotation matrix
between
the 1st and the 2nd cameras’ coordinate systems? I have individual rotation matrix and translation vector but how do I get it for "between the cameras?"
.Also this would give me the 3D points .Is there a method to generate the surface normals?

Given that you have the extrinsic matrix of both cameras, can't you simply take the inverse extrinsic matrix of camera 1, multiplied by the extrinsic matrix of camera 2?
Also, for a direct relation between the two cameras, take a look at the Fundamental Matrix (or, more specific, the Essential matrix). See if you can find a copy of the book Multiple View Geometry by Hartley and Zisserman.
As for the surface normals, you can compute those yourself by computing crossproducts on the corners of triangles. However, you then first need the reconstructed 3D point cloud.

Related

OpenCV - Projection, homography matrix and bird's eye view

I'd like to get homography matrix to Bird's eye view and I know the projection Matrix of the camera. Is there any relation between them?
Thanks.
A projection matrix is defined as a product of camera's intrinsic (e.g. focal length, principal points, etc.) and extrinsic (rotation and translation) matrices. The question is w.r.t what your rotation and translation are? For example, I can imagine another camera or an object in 3D with respect to which you have these rotations and translations. Otherwise your projection is just an intrinsic matrix.
Think first about the pieces of information you need to know to obtain a bird’s eye view: you need to know at least how your camera is oriented w.r.t ground surface. If you also know camera elevation you can create a metric reconstruction. But since you mentioned a homography, I assume that you consider a bird’s eye view of a flat surface since a homography maps the points on two flat surfaces, in your case the points on a flat ground to the points on your flat sensor.
Let’s consider a pinhole camera equation. It basically says that
[u, v, 1]T ~ A*[R|t][x, y, z, 1]T, where A is a camera intrinsic matrix. Now since you deal with a ground plane, you can align a new coordinate system with it by setting z=0; R|t are rotation and translation matrices from this coordinate system into your camera-aligned system;
Next, note that your R|t is a 3x4 matrix and it looses one dimension since z=0; it becomes 3x3 or Homography which is equal now to H=A*R’|t; Ok all we did was proving that a homography mapping existed between the ground and your sensor;
Now, you want another kind of homography that happens during pure camera rotations and zooms between points on sensor before and after rotations/zoom; that is you want to rotate the camera down and possibly zoom out. Again, think in terms of a pinhole camera equation: originally you had H1=A ( here I threw out R|T as irrelevant for now) and then you rotated your camera and you have H2=AR; in other words, H1 is how you make your image now and H2 is how you want your image look like.
The relations between two is what you want to find, H12, and it is also a homography since the Homography is a family of transformations (use this simple heuristics: what happens in a family stays in the family). Since the same surface can generate images either with H1 or H2 we can assemble H12 by undoing H1 (back to the ground plane) and applying H2 (from the ground to a sensor bird’s eye view); in a way this resembles operations with vectors, you just have to respect the order of matrix application from the right to the left:
H12 = H2*H1-1=A*R*A-1=P*A-1 , where we substituted the expressions for H1, H2 and finally for a projection matrix (in case you do have it)
This is your answer, and if the rotation R is unknown it can be guessed from the camera orientation w.r.t. the ground or calculated using solvePnP() from the opeCV library. Finally, when I do this on a cell phone I just use its accelerometer readings as a good approximation since when a cell phone is not accelerated the readings represent a gravity vector which gives the rotation w.r.t. flat horizontal ground.
When you plot your bird’s eye view as an image you will notice that its boundaries turned from rectangular into some kind of a trapezoid (due to a camera frustum shape) and there are some holes at the distant locations (due to the insufficient sampling rate). You can interpolate inside the holes using wrapPerspective()

Triangulation to find distance to the object- Image to world coordinates

Localization of an object specified in the image.
I am working on the project of computer vision to find the distance of an object using stereo images.I followed the following steps using OpenCV to achieve my objective
1. Calibration of camera
2. Surf matching to find fundamental matrix
3. Rotation and Translation vector using svd as method is described in Zisserman and Hartley book.
4. StereoRectify to get the projection matrix P1, P2 and Rotation matrices R1, R2. The Rotation matrices can also be find using Homography R=CameraMatrix.inv() H Camera Matrix.
Problems:
i triangulated point using least square triangulation method to find the real distance to the object. it returns value in the form of [ 0.79856 , .354541 .258] . How will i map it to real world coordinates to find the distance to an object.
http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/
Alternative approach:
Find the disparity between the object in two images and find the depth using the given formula
Depth= ( focal length * baseline ) / disparity
for disparity we have to perform the rectification first and the points must be undistorted. My rectification images are black.
Please help me out.It is important
Here is the detail explanation of how i implemented the code.
Calibration of Camera using Circles grid to get the camera matrix and Distortion coefficient. The code is given on the Github (Andriod).
2.Take two pictures of a car. First from Left and other from Right. Take the sub-image and calculate the -fundmental matrix- essential matrix- Rotation matrix- Translation Matrix....
3.I have tried to projection in two ways.
Take the first image projection as identity matrix and make a second project 3x4d through rotation and translation matrix and perform Triangulation.
Get the Projection matrix P1 and P2 from Stereo Rectify to perform Triangulation.
My object is 65 meters away from the camera and i dont know how to calculate this true this based on the result of triangulation in the form of [ 0.79856 , .354541 .258]
Question: Do i have to do some extra calibration to get the result. My code is not based to know the detail of geometric size of the object.
So you already computed the triangulation? Well, then you have points in camera coordinates, i.e. in the coordinate frame centered on one of the cameras (the left or right one depending on how your code is written and the order in which you feed your images to it).
What more do you want? The vector length (square root of the sum of the square coordinates) of those points is their estimated distance from the same camera. If you want their position in some other "world" coordinate system, you need to give the coordinate transform between that system and the camera - presumably through a calibration procedure.

Consistency of projecting points onto an undistorted image

I want to project a point in 3D space into 2D image coordinates. I have the calibrated intrinsics and extrinsics of the camera I'm using. I have the camera matrix K and distortion coefficients D. However, I want the projected image coordinates to be of the undistorted image.
From my research, I found two ways to do this.
Use opencv's getOptimalNewCameraMatrix function to compute a new undistorted image's camera matrix K'. Then use this K' in opencv's projectPoints function, with the distortion parameters set to 0, to get the projected point.
Use projectPoints function using the raw camera matrix K, along with the distortion coefficients D in this function and get the projected point.
Should the output of both methods match?
I think that there is something missing in your thought.
Camera matrix K and dist. coefficent D are the parameters for make the undistortion (if your lens is distorting the image like in a fisheye). They are what is called intrinsic camera parameters.
If we change terms from computer vision to computer graphics, those parameters are the one you use for defining the frustum of the view, and, for example, they are used for getting the focal length of the camera.
But they are not enough to do the projection stuff.
For the projection, if you think in a computer graphics term (like opengl, for instance) you need to have the model-view-projection matrix. The model matrix is the matrix that specifies the position of the object in the world. The view matrix specifies the position of the camera, and the projection matrix specify the frustum (focal angle, perspective distortion, etc).
If you want to know how to transform the points of the model from 3d to 2d (or viceversa) you need the projection and the view matrixes (you have the model matrix because you have the 3d points from which you want to start). And in computer vision the view matrix is called estrinsic parameters.
So, you need the estrinsic parameters too, that are the position of the camera in the world. That is, for instance, those parameters are the rvec and tvec that cv:: projectPoints needs.
If you want to compute them, they are exactly the output of cv::solvePnP that do the opposite of what you want to do: from some known 3d points coupled with the known 2d projection on them on the camera screen, this function gives you the estrinsic parameters (from which you can get the view matrix for some opengl-opencv-augmented-reality-whatever application via cv::Rodrigues).
Last note: while the instrinsic parameters are fixed in all the pictures you shoot with a camera (while you don't change the focal length of course), the estrinisc parameters changes every time you move the camera for take a new picture from a different view point (that is: this changes the perspective point of view, so the 3D-2D projection you want to find)
Hope could help!

Find projection matrix

at first I want to apologize for my bad English.
I am really new in OpenCV and in virtual reality. I tried to find out the theory of image processing, but some points are missing there for me. I learned that projection matrix is matrix to transform 3D point to 2D. Am I right? Essential matrix gives me information about rotation between two cameras and fundamental matrix gives information about the relationship between pixel in one image with pixel in other image. The homography matrix relates coordinates of pixel in two image (is that correct?).
What is the difference between fundamental and homography matrix?
Do I need all these matrices to get projection matrix?
I am new in these, so please if you can, try to explain me it simply.
Thanks for your help.
I learned that projection matrix is matrix to transform 3D point to 2D. Am I right?
Yes. But usually these transformations are expressed in homogeneous coordinates. This means that 3D points are represented by 4-vectors (ie vectors of length 4), and 2D points are represented by 3-vectors.
The homography matrix relates coordinates of pixel in two image (is that correct?)
No. This is true in two special cases only: when the scene lie on a plane, or when the two views have been generated by two cameras sharing the same center location.
In all the other cases, ie when the scene is not planar and the two cameras have different centers, there is not an homography transforming one image into the other.
What is the difference between fundamental and homography matrix?
There are many differences. From an algebraic point of view, the most obvious difference is that an homography matrix is non-singular (its rank is 3), while a fundamental matrix is singular (its rank is 2).

Project 2d points in camera 1 image to camera 2 image after a stereo calibration

I am doing stereo calibration of two cameras (let's name them L and R) with opencv. I use 20 pairs of checkerboard images and compute the transformation of R with respect to L. What I want to do is use a new pair of images, compute the 2d checkerboard corners in image L, transform those points according to my calibration and draw the corresponding transformed points on image R with the hope that they will match the corners of the checkerboard in that image.
I tried the naive way of transforming the 2d points from [x,y] to [x,y,1], multiply by the 3x3 rotation matrix, add the rotation vector and then divide by z, but the result is wrong, so I guess it's not that simple (?)
Edit (to clarify some things):
The reason I want to do this is basically because I want to validate the stereo calibration on a new pair of images. So, I don't actually want to get a new 2d transformation between the two images, I want to check if the 3d transformation I have found is correct.
This is my setup:
I have the rotation and translation relating the two cameras (E), but I don't have rotations and translations of the object in relation to each camera (E_R, E_L).
Ideally what I would like to do:
Choose the 2d corners in image from camera L (in pixels e.g. [100,200] etc).
Do some kind of transformation on the 2d points based on matrix E that I have found.
Get the corresponding 2d points in image from camera R, draw them, and hopefully they match the actual corners!
The more I think about it though, the more I am convinced that this is wrong/can't be done.
What I am probably trying now:
Using the intrinsic parameters of the cameras (let's say I_R and I_L), solve 2 least squares systems to find E_R and E_L
Choose 2d corners in image from camera L.
Project those corners to their corresponding 3d points (3d_points_L).
Do: 3d_points_R = (E_L).inverse * E * E_R * 3d_points_L
Get the 2d_points_R from 3d_points_R and draw them.
I will update when I have something new
It is actually easy to do that but what you're making several mistakes. Remember after stereo calibration R and L relate the position and orientation of the second camera to the first camera in the first camera's 3D coordinate system. And also remember to find the 3D position of a point by a pair of cameras you need to triangulate the position. By setting the z component to 1 you're making two mistakes. First, most likely you have used the common OpenCV stereo calibration code and have given the distance between the corners of the checker board in cm. Hence, z=1 means 1 cm away from the center of camera, that's super close to the camera. Second, by setting the same z for all the points you are saying the checker board is perpendicular to the principal axis (aka optical axis, or principal ray), while most likely in your image that's not the case. So you're transforming some virtual 3D points first to the second camera's coordinate system and then projecting them onto the image plane.
If you want to transform just planar points then you can find the homography between the two cameras (OpenCV has the function) and use that.

Resources