3d to 2d transformation in opencv - opencv

I have set of known 3D points in world coordinate system and I know corresponding 2D points in the image.
Now for a new 3D coordinate (x, y, z) I need to find the 2D image coordinate (u, v) how can I find that in OpenCV ?? How can I find transformation matrix (camera matrix, rotation, translation) using OpenCV ?

First you need to read about Fundamental Matrix , and epipolar geometry and understand how projection of world coordinates to image plane is done.
From the first part of your question it seems you already have this projection matrix. For any new world coordinates just use this matrix.

Related

How can I find 3d-affine transformation of my 3D model if I have 2D points of its projection?

I have 3D points of my model. And I have 2D points - projection of these 3D points of my model on plane. I want to find 3d-affine transformation (translation, rotation and scale) of 3D-model so that projection of this 3D-model give me 2D points on plane the same as I have.
How can I find 3d-affine transformation of my 3D model if I have 2D points of its projection?
Just find the null space to your projection matrix, e.g. in matlab you can use u=null(P) (or Python (NumPy, SciPy), finding the null space of a matrix in numpy). This will be a single vector, as P is projecting one dimension down from 3D space.
An affine transformation satisfying P*A=P (where P is the projection and A is the affine transformation) would be A=([u u ... u]+I), where you form a matrix from the nullspace vector u to match the dimension of A (likely 4x4 to include translation).

Exact definition of the matrices in OpenCv StereoRectify

Normally the definition of a projection matrix P is the 3x4 matrix which projects point from world coordinates to image/pixel coordinates. The projection matrix can be split up into:
K: a 3x4 camera matrix K with the intrinsic parameters
T: a 4x4 transformation matrix with the extrinsic parameters
The projection matrix is then P = K * T.
What are the clear definitions of the following input to OpenCV's stereoRectify:
cameraMatrix1 – First camera matrix (I assume it is the instrinsic K part of the projection matrix, correct?).
R – Rotation matrix between the coordinate systems of the first and the second cameras. (what does 'between' means? Is it the rotation from cam1 to cam2 or from cam2 to cam1?)
T – Translation vector between coordinate systems of the cameras. (Same is above. Is the translation from cam1 -> cam2 or cam2->cam1)
R1 – Output 3x3 rectification transform (rotation matrix) for the first camera. (Is this the rotation after rectification so the new extrinsic part of the projection matrix becomes T1new = R1*T1old?)
P1 – Output 3x4 projection matrix in the new (rectified) coordinate systems for the first camera. (What is meant by 'projection matrix in the new coordinate system'? It seems that this projection matrix is dependent on the rotation matrix R1 to project point from world coordinates to image/pixel coordinates, so from the above definition it is neither the 'projection matrix' or the 'camera matrix' but some kind of mixture of the two)
CAMERAMATRIX1 - is the intrinsic K matrix as computed by stereocalibrate() function in opencv. you got it right!!!
R is the rotation matrix of cam2 frame w.r.t cam1 frame. Similarily , T is the translation vector of cam2 origin w.r.t
cam1 origin.
If you'll look in O'Riley book "LEARNING OPENCV" pg.-434, you'll understand what R1(/Rl) and R2(/Rr) are.
Rl=[Rrect][rl]; Rr=[Rect][rr];
let camera's picture planes be plane1 and plane2. When stereo rectification hasn't been done , then plane1 and plane2 will not be parallel at all. Also, the epilines willn't be parallel to the stereo camera baseline. So, what Rl does is that it transforms the left image plane to be parallel to right image plane(which is transformed by Rr) and also , epilines on both images are now parallel .
P1 and P2 are the new projection matrices after stereo rectification. Remember, camera matrix(K) transforms a point in 3d space onto 2d image plane. But P1 and P2 transforms a point in 3d space on rectified 2d image planes.
if you have calibrated a stereo camera rig before and observed the P1 and K1 values, you'll find that they are pretty much similiar if your stereo rig is almost in rectified configuration (obviously within human range)

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Converting a 2D image point to a 3D world point

I know that in the general case, making this conversion is impossible since depth information is lost going from 3d to 2d.
However, I have a fixed camera and I know its camera matrix. I also have a planar calibration pattern of known dimensions - let's say that in world coordinates it has corners (0,0,0) (2,0,0) (2,1,0) (0,1,0). Using opencv I can estimate the pattern's pose, giving the translation and rotation matrices needed to project a point on the object to a pixel in the image.
Now: this 3d to image projection is easy, but how about the other way? If I pick a pixel in the image that I know is part of the calibration pattern, how can I get the corresponding 3d point?
I could iteratively choose some random 3d point on the calibration pattern, project to 2d, and refine the 3d point based on the error. But this seems pretty horrible.
Given that this unknown point has world coordinates something like (x,y,0) -- since it must lie on the z=0 plane -- it seems like there should be some transformation that I can apply, instead of doing the iterative nonsense. My maths isn't very good though - can someone work out this transformation and explain how you derive it?
Here is a closed form solution that I hope can help someone. Using the conventions in the image from your comment above, you can use centered-normalized pixel coordinates (usually after distortion correction) u and v, and extrinsic calibration data, like this:
|Tx| |r11 r21 r31| |-t1|
|Ty| = |r12 r22 r32|.|-t2|
|Tz| |r13 r23 r33| |-t3|
|dx| |r11 r21 r31| |u|
|dy| = |r12 r22 r32|.|v|
|dz| |r13 r23 r33| |1|
With these intermediate values, the coordinates you want are:
X = (-Tz/dz)*dx + Tx
Y = (-Tz/dz)*dy + Ty
Explanation:
The vector [t1, t2, t3]t is the position of the origin of the world coordinate system (the (0,0) of your calibration pattern) with respect to the camera optical center; by reversing signs and inversing the rotation transformation we obtain vector T = [Tx, Ty, Tz]t, which is the position of the camera center in the world reference frame.
Similarly, [u, v, 1]t is the vector in which lies the observed point in the camera reference frame (starting from camera center). By inversing the rotation transformation we obtain vector d = [dx, dy, dz]t, which represents the same direction in world reference frame.
To inverse the rotation transformation we take advantage of the fact that the inverse of a rotation matrix is its transpose (link).
Now we have a line with direction vector d starting from point T, the intersection of this line with plane Z=0 is given by the second set of equations. Note that it would be similarly easy to find the intersection with the X=0 or Y=0 planes or with any plane parallel to them.
Yes, you can. If you have a transformation matrix that maps a point in the 3d world to the image plane, you can just use the inverse of this transformation matrix to map a image plane point to the 3d world point. If you already know that z = 0 for the 3d world point, this will result in one solution for the point. There will be no need to iteratively choose some random 3d point. I had a similar problem where I had a camera mounted on a vehicle with a known position and camera calibration matrix. I needed to know the real world location of a lane marking captured on the image place of the camera.
If you have Z=0 for you points in world coordinates (which should be true for planar calibration pattern), instead of inversing rotation transformation, you can calculate homography for your image from camera and calibration pattern.
When you have homography you can select point on image and then get its location in world coordinates using inverse homography.
This is true as long as the point in world coordinates is on the same plane as the points used for calculating this homography (in this case Z=0)
This approach to this problem was also discussed below this question on SO: Transforming 2D image coordinates to 3D world coordinates with z = 0

Compute transformation matrix from a set of coordinates (with OpenCV)

I have a small cube with n (you can assume that n = 4) distinguished points on its surface. These points are numbered (1-n) and form a coordinate space, where point #1 is the origin.
Now I'm using a tracking camera to get the coordinates of those points, relative to the camera's coordinate space. That means that I now have n vectors p_i pointing from the origin of the camera to the cube's surface.
With that information, I'm trying to compute the affine transformation matrix (rotation + translation) that represents the transformation between those two coordinate spaces. The translation part is fairly trivial, but I'm struggling with the computation of the rotation matrix.
Is there any build-in functionality in OpenCV that might help me solve this problem?
Sounds like cvGetPerspectiveTransform is what you're looking for; cvFindHomograpy might also be helpful.
solvePnP should give you the rotation matrix and the translation vector. Try it with CV_EPNP or CV_ITERATIVE.
Edit: Or perhaps you're looking for RQ decomposition.
Look at the Stereo Camera tutorial for OpenCV. OpenCV uses a planar chessboard for all the computation and sets its Z-dimension to 0 to build its list of 3D points. You already have 3D points so change the code in the tutorial to reflect your list of 3D points. Then you can compute the transformation.

Resources