How to calculate a perspective transform matrix in 3D with OpenCV

How to calculate a perspective transform matrix in 3D with OpenCV - opencv

I have two sets of 2D points: A and B. I can calculate a 3x3 perspective transform matrix between these sets with getPerspectiveTransform(). Let's place these sets (planes) in 3D coordinate system (each point acquires a Z-coordinate now) to get two 3D sets: A' and B'. There is no equivalent of getPerspectiveTransform() for this 3D case. What is the way to calculate a 4x4 perspective transform matrix between A' and B'? It doesn't have to be OpenCV, if there are other convenient libraries with that functionality.
NOTE
The application is a projector calibration, therefore camera models are not directly applicable. I have an accurate 2D homography between projector image source and projected image. I have much less accurate measurement of world position and orientation of the projector. I need to extend the transform to objects relatively close to the projected image plane. I know I can do it by calculating ray intersection with projected image plane, but I'm looking for a more compact solution in the form of a transform matrix.
NOTE'
I'm insisting on the transformation being a projection, because that's the real-world geometry I'm modelling. B' points lie on the plane in 3D, specifically on the image forming rectangle inside the projector. A' points are projections of B' onto a 3D surface. It may be planar, requiring additional parameters or points outside the plane for the transformation to be unique .

Related

How can I find 3d-affine transformation of my 3D model if I have 2D points of its projection?

I have 3D points of my model. And I have 2D points - projection of these 3D points of my model on plane. I want to find 3d-affine transformation (translation, rotation and scale) of 3D-model so that projection of this 3D-model give me 2D points on plane the same as I have.
How can I find 3d-affine transformation of my 3D model if I have 2D points of its projection?

Just find the null space to your projection matrix, e.g. in matlab you can use u=null(P) (or Python (NumPy, SciPy), finding the null space of a matrix in numpy). This will be a single vector, as P is projecting one dimension down from 3D space.
An affine transformation satisfying P*A=P (where P is the projection and A is the affine transformation) would be A=([u u ... u]+I), where you form a matrix from the nullspace vector u to match the dimension of A (likely 4x4 to include translation).

OpenCV - Projection, homography matrix and bird's eye view

I'd like to get homography matrix to Bird's eye view and I know the projection Matrix of the camera. Is there any relation between them?
Thanks.

A projection matrix is defined as a product of camera's intrinsic (e.g. focal length, principal points, etc.) and extrinsic (rotation and translation) matrices. The question is w.r.t what your rotation and translation are? For example, I can imagine another camera or an object in 3D with respect to which you have these rotations and translations. Otherwise your projection is just an intrinsic matrix.
Think first about the pieces of information you need to know to obtain a bird’s eye view: you need to know at least how your camera is oriented w.r.t ground surface. If you also know camera elevation you can create a metric reconstruction. But since you mentioned a homography, I assume that you consider a bird’s eye view of a flat surface since a homography maps the points on two flat surfaces, in your case the points on a flat ground to the points on your flat sensor.
Let’s consider a pinhole camera equation. It basically says that
[u, v, 1]T ~ A*[R|t][x, y, z, 1]T, where A is a camera intrinsic matrix. Now since you deal with a ground plane, you can align a new coordinate system with it by setting z=0; R|t are rotation and translation matrices from this coordinate system into your camera-aligned system;
Next, note that your R|t is a 3x4 matrix and it looses one dimension since z=0; it becomes 3x3 or Homography which is equal now to H=A*R’|t; Ok all we did was proving that a homography mapping existed between the ground and your sensor;
Now, you want another kind of homography that happens during pure camera rotations and zooms between points on sensor before and after rotations/zoom; that is you want to rotate the camera down and possibly zoom out. Again, think in terms of a pinhole camera equation: originally you had H1=A ( here I threw out R|T as irrelevant for now) and then you rotated your camera and you have H2=AR; in other words, H1 is how you make your image now and H2 is how you want your image look like.
The relations between two is what you want to find, H12, and it is also a homography since the Homography is a family of transformations (use this simple heuristics: what happens in a family stays in the family). Since the same surface can generate images either with H1 or H2 we can assemble H12 by undoing H1 (back to the ground plane) and applying H2 (from the ground to a sensor bird’s eye view); in a way this resembles operations with vectors, you just have to respect the order of matrix application from the right to the left:
H12 = H2*H1-1=A*R*A-1=P*A-1 , where we substituted the expressions for H1, H2 and finally for a projection matrix (in case you do have it)
This is your answer, and if the rotation R is unknown it can be guessed from the camera orientation w.r.t. the ground or calculated using solvePnP() from the opeCV library. Finally, when I do this on a cell phone I just use its accelerometer readings as a good approximation since when a cell phone is not accelerated the readings represent a gravity vector which gives the rotation w.r.t. flat horizontal ground.
When you plot your bird’s eye view as an image you will notice that its boundaries turned from rectangular into some kind of a trapezoid (due to a camera frustum shape) and there are some holes at the distant locations (due to the insufficient sampling rate). You can interpolate inside the holes using wrapPerspective()

Converting a 2D image point to a 3D world point

I know that in the general case, making this conversion is impossible since depth information is lost going from 3d to 2d.
However, I have a fixed camera and I know its camera matrix. I also have a planar calibration pattern of known dimensions - let's say that in world coordinates it has corners (0,0,0) (2,0,0) (2,1,0) (0,1,0). Using opencv I can estimate the pattern's pose, giving the translation and rotation matrices needed to project a point on the object to a pixel in the image.
Now: this 3d to image projection is easy, but how about the other way? If I pick a pixel in the image that I know is part of the calibration pattern, how can I get the corresponding 3d point?
I could iteratively choose some random 3d point on the calibration pattern, project to 2d, and refine the 3d point based on the error. But this seems pretty horrible.
Given that this unknown point has world coordinates something like (x,y,0) -- since it must lie on the z=0 plane -- it seems like there should be some transformation that I can apply, instead of doing the iterative nonsense. My maths isn't very good though - can someone work out this transformation and explain how you derive it?

Here is a closed form solution that I hope can help someone. Using the conventions in the image from your comment above, you can use centered-normalized pixel coordinates (usually after distortion correction) u and v, and extrinsic calibration data, like this:
|Tx| |r11 r21 r31| |-t1|
|Ty| = |r12 r22 r32|.|-t2|
|Tz| |r13 r23 r33| |-t3|
|dx| |r11 r21 r31| |u|
|dy| = |r12 r22 r32|.|v|
|dz| |r13 r23 r33| |1|
With these intermediate values, the coordinates you want are:
X = (-Tz/dz)*dx + Tx
Y = (-Tz/dz)*dy + Ty
Explanation:
The vector [t1, t2, t3]t is the position of the origin of the world coordinate system (the (0,0) of your calibration pattern) with respect to the camera optical center; by reversing signs and inversing the rotation transformation we obtain vector T = [Tx, Ty, Tz]t, which is the position of the camera center in the world reference frame.
Similarly, [u, v, 1]t is the vector in which lies the observed point in the camera reference frame (starting from camera center). By inversing the rotation transformation we obtain vector d = [dx, dy, dz]t, which represents the same direction in world reference frame.
To inverse the rotation transformation we take advantage of the fact that the inverse of a rotation matrix is its transpose (link).
Now we have a line with direction vector d starting from point T, the intersection of this line with plane Z=0 is given by the second set of equations. Note that it would be similarly easy to find the intersection with the X=0 or Y=0 planes or with any plane parallel to them.

Yes, you can. If you have a transformation matrix that maps a point in the 3d world to the image plane, you can just use the inverse of this transformation matrix to map a image plane point to the 3d world point. If you already know that z = 0 for the 3d world point, this will result in one solution for the point. There will be no need to iteratively choose some random 3d point. I had a similar problem where I had a camera mounted on a vehicle with a known position and camera calibration matrix. I needed to know the real world location of a lane marking captured on the image place of the camera.

If you have Z=0 for you points in world coordinates (which should be true for planar calibration pattern), instead of inversing rotation transformation, you can calculate homography for your image from camera and calibration pattern.
When you have homography you can select point on image and then get its location in world coordinates using inverse homography.
This is true as long as the point in world coordinates is on the same plane as the points used for calculating this homography (in this case Z=0)
This approach to this problem was also discussed below this question on SO: Transforming 2D image coordinates to 3D world coordinates with z = 0

How can I transform an image using matrices R and T (extrinsic parameters matrices) in opencv?

I have a rotation-translation matrix [R T] (3x4).
Is there a function in opencv that performs the rotation-translation described by [R T]?

A lot of solutions to this question I think make hidden assumptions. I will try to give you a quick summary of how I think about this problem (I have had to think about it a lot in the past). Warping between two images is a 2 dimensional process accomplished by a 3x3 matrix called a homography. What you have is a 3x4 matrix which defines a transform in 3 dimensions. You can convert between the two by treating your image as a flat plane in 3 dimensional space. The trick then is to decide on the initial position in world space of your image plane. You can then transform its position and project it onto a new image plane with your camera intrinsics matrix.
The first step is to decide where your initial image lies in world space, note that this does not have to be the same as your initial R and T matrices specify. Those are in world coordinates, we are talking about the image created by that world, all the objects in the image have been flattened into a plane. The simplest decision here is to set the image at a fixed displacement on the z axis and no rotation. From this point on I will assume no rotation. If you would like to see the general case I can provide it but it is slightly more complicated.
Next you define the transform between your two images in 3d space. Since you have both transforms with respect to the same origin, the transform from [A] to [B] is the same as the transform from [A] to your origin, followed by the transform from the origin to [B]. You can get that by
transform = [B]*inverse([A])
Now conceptually what you need to do is to take your first image, project its pixels onto the geometric interpretation of your image in 3d space, then transform those pixels in 3d space by the transform above, then project them back onto a new 2d image with your camera matrix. Those steps need to be combined into a single 3x3 matrix.
cv::Matx33f convert_3x4_to_3x3(cv::Matx34f pose, cv::Matx33f camera_mat, float zpos)
{
//converted condenses the 3x4 matrix which transforms a point in world space
//to a 3x3 matrix which transforms a point in world space. Instead of
//multiplying pose by a 4x1 3d homogeneous vector, by specifying that the
//incoming 3d vectors will ALWAYS have a z coordinate of zpos, one can instead
//multiply converted by a homogeneous 2d vector and get the same output for x and y.
cv::Matx33f converted(pose(0,0),pose(0,1),pose(0,2)*zpos+pose(0,3),
pose(1,0),pose(1,1),pose(1,2)*zpos+pose(1,3),
pose(2,0),pose(2,1),pose(2,2)*zpos+pose(2,3));
//This matrix will take a homogeneous 2d coordinate and "projects" it onto a
//flat plane at zpos. The x and y components of the incoming homogeneous 2d
//coordinate will be correct, the z component is dropped.
cv::Matx33f projected(1,0,0,
0,1,0,
0,0,zpos);
projected = projected*camera_mat.inv();
//now we have the pieces. A matrix which can take an incoming 2d point, and
//convert it into a pseudo 3d point (x and y correspond to 3d, z is unused)
//and a matrix which can take our pseudo 3d point and transform it correctly.
//Now we just need to turn our transformed pseudo 3d point back into a 2d point
//in our new image, to do that simply multiply by the camera matrix.
return camera_mat*converted*projected;
}
This is probably a more complicated answer than you were looking for but I hope it gives you an idea of what you are asking. This can be very confusing and I glazed over some parts of it quickly, feel free to ask for clarification. If you need the solution to work without the assumption that the initial image appears without rotation let me know, I just didn't want to make it more complicated than it needed to be.

Project 2d points in camera 1 image to camera 2 image after a stereo calibration

I am doing stereo calibration of two cameras (let's name them L and R) with opencv. I use 20 pairs of checkerboard images and compute the transformation of R with respect to L. What I want to do is use a new pair of images, compute the 2d checkerboard corners in image L, transform those points according to my calibration and draw the corresponding transformed points on image R with the hope that they will match the corners of the checkerboard in that image.
I tried the naive way of transforming the 2d points from [x,y] to [x,y,1], multiply by the 3x3 rotation matrix, add the rotation vector and then divide by z, but the result is wrong, so I guess it's not that simple (?)
Edit (to clarify some things):
The reason I want to do this is basically because I want to validate the stereo calibration on a new pair of images. So, I don't actually want to get a new 2d transformation between the two images, I want to check if the 3d transformation I have found is correct.
This is my setup:
I have the rotation and translation relating the two cameras (E), but I don't have rotations and translations of the object in relation to each camera (E_R, E_L).
Ideally what I would like to do:
Choose the 2d corners in image from camera L (in pixels e.g. [100,200] etc).
Do some kind of transformation on the 2d points based on matrix E that I have found.
Get the corresponding 2d points in image from camera R, draw them, and hopefully they match the actual corners!
The more I think about it though, the more I am convinced that this is wrong/can't be done.
What I am probably trying now:
Using the intrinsic parameters of the cameras (let's say I_R and I_L), solve 2 least squares systems to find E_R and E_L
Choose 2d corners in image from camera L.
Project those corners to their corresponding 3d points (3d_points_L).
Do: 3d_points_R = (E_L).inverse * E * E_R * 3d_points_L
Get the 2d_points_R from 3d_points_R and draw them.
I will update when I have something new

It is actually easy to do that but what you're making several mistakes. Remember after stereo calibration R and L relate the position and orientation of the second camera to the first camera in the first camera's 3D coordinate system. And also remember to find the 3D position of a point by a pair of cameras you need to triangulate the position. By setting the z component to 1 you're making two mistakes. First, most likely you have used the common OpenCV stereo calibration code and have given the distance between the corners of the checker board in cm. Hence, z=1 means 1 cm away from the center of camera, that's super close to the camera. Second, by setting the same z for all the points you are saying the checker board is perpendicular to the principal axis (aka optical axis, or principal ray), while most likely in your image that's not the case. So you're transforming some virtual 3D points first to the second camera's coordinate system and then projecting them onto the image plane.
If you want to transform just planar points then you can find the homography between the two cameras (OpenCV has the function) and use that.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart