Exact definition of the matrices in OpenCv StereoRectify - opencv

Normally the definition of a projection matrix P is the 3x4 matrix which projects point from world coordinates to image/pixel coordinates. The projection matrix can be split up into:
K: a 3x4 camera matrix K with the intrinsic parameters
T: a 4x4 transformation matrix with the extrinsic parameters
The projection matrix is then P = K * T.
What are the clear definitions of the following input to OpenCV's stereoRectify:
cameraMatrix1 – First camera matrix (I assume it is the instrinsic K part of the projection matrix, correct?).
R – Rotation matrix between the coordinate systems of the first and the second cameras. (what does 'between' means? Is it the rotation from cam1 to cam2 or from cam2 to cam1?)
T – Translation vector between coordinate systems of the cameras. (Same is above. Is the translation from cam1 -> cam2 or cam2->cam1)
R1 – Output 3x3 rectification transform (rotation matrix) for the first camera. (Is this the rotation after rectification so the new extrinsic part of the projection matrix becomes T1new = R1*T1old?)
P1 – Output 3x4 projection matrix in the new (rectified) coordinate systems for the first camera. (What is meant by 'projection matrix in the new coordinate system'? It seems that this projection matrix is dependent on the rotation matrix R1 to project point from world coordinates to image/pixel coordinates, so from the above definition it is neither the 'projection matrix' or the 'camera matrix' but some kind of mixture of the two)

CAMERAMATRIX1 - is the intrinsic K matrix as computed by stereocalibrate() function in opencv. you got it right!!!
R is the rotation matrix of cam2 frame w.r.t cam1 frame. Similarily , T is the translation vector of cam2 origin w.r.t
cam1 origin.
If you'll look in O'Riley book "LEARNING OPENCV" pg.-434, you'll understand what R1(/Rl) and R2(/Rr) are.
Rl=[Rrect][rl]; Rr=[Rect][rr];
let camera's picture planes be plane1 and plane2. When stereo rectification hasn't been done , then plane1 and plane2 will not be parallel at all. Also, the epilines willn't be parallel to the stereo camera baseline. So, what Rl does is that it transforms the left image plane to be parallel to right image plane(which is transformed by Rr) and also , epilines on both images are now parallel .
P1 and P2 are the new projection matrices after stereo rectification. Remember, camera matrix(K) transforms a point in 3d space onto 2d image plane. But P1 and P2 transforms a point in 3d space on rectified 2d image planes.
if you have calibrated a stereo camera rig before and observed the P1 and K1 values, you'll find that they are pretty much similiar if your stereo rig is almost in rectified configuration (obviously within human range)

Related

Difference between undistortPoints() and projectPoints() in OpenCV

From my understanding, undistortPoints takes a set of points on a distorted image, and calculates where their coordinates would be on an undistorted version of the same image. Likewise, projectPoints maps a set of object coordinates to their corresponding image coordinates.
However, I am unsure if projectPoints maps the object coordinates to a set of image points on the distorted image (ie. the original image) or one that has been undistorted (straight lines)?
Furthermore, the OpenCV documentation for undistortPoints states that 'the function performs a reverse transformation to projectPoints()'. Could you please explain how this is so?
Quote from the 3.2 documentation for projectPoints():
Projects 3D points to an image plane.
The function computes
projections of 3D points to the image plane given intrinsic and
extrinsic camera parameters.
You have the parameter distCoeffs:
If the vector is empty, the zero distortion coefficients are assumed.
With no distorsion the equation is:
With K the intrinsic matrix and [R | t] the extrinsic matrix or the transformation that transforms a point in the object or world frame to the camera frame.
For undistortPoints(), you have the parameter R:
Rectification transformation in the object space (3x3 matrix). R1 or R2 computed by cv::stereoRectify can be passed here. If the matrix is empty, the identity transformation is used.
The reverse transformation is the operation where you compute for a 2D image point ([u, v]) the corresponding 3D point in the normalized camera frame ([x, y, z=1]) using the intrinsic parameters.
With the extrinsic matrix, you can get the point in the camera frame:
The normalized camera frame is obtained by dividing by the depth:
Assuming no distortion, the image point is:
And the "reverse transformation" assuming no distortion:

3d to 2d transformation in opencv

I have set of known 3D points in world coordinate system and I know corresponding 2D points in the image.
Now for a new 3D coordinate (x, y, z) I need to find the 2D image coordinate (u, v) how can I find that in OpenCV ?? How can I find transformation matrix (camera matrix, rotation, translation) using OpenCV ?
First you need to read about Fundamental Matrix , and epipolar geometry and understand how projection of world coordinates to image plane is done.
From the first part of your question it seems you already have this projection matrix. For any new world coordinates just use this matrix.

rvec/tvec from cv::calibrateCamera and R1 from cv::stereoRectify

Related posts:
Exact definition of the matrices in OpenCv StereoRectify
What is the camera frame of the rvec and tvec calculated from the cv::calibrateCamera? Is it the original (distorted) camera or the undistorted one? Does the camera coordinate change when the image is undistorted (not rectified)?
What is the R1 from the cv::stereoRectify(). To my understanding, R1 rotate the left camera coordinate (O_c) to a frontal parallel camera coordinate (O_cr) so that the image is rectified (row aligned with the right one). In other word, apply R1 on the 3D points in the O_cr will result in points in the O_c. (or is it the other way around?)
Few posts and the OpenCV book tried to explain it, but I just want to confirm that I understand it clearly. As the explanation of rotating image plane is confusing for me.
Thanks!
I can only reply to 1)
rvec and tvec describe the camera pose expressed in your calibration pattern's coordinate system. For each calibration pose you get an individual rvec and tvec.
Undistortion does not influence the camera position. Pixel positions are modified by using the radial and tangential distortion parameters (distCoeffs) resulting from the camera calibration.

How to change world space camera orientation and position vectors to the camera space?

My problem is exactly the reverse of How to determine world coordinates of a camera?.
If R and t are the vector of orientation and position of the camera in the world space? How do I transform easily back to the same space like the rvec and tvec?
If you say you do have an R and t in world space, then this is not very accurate.
However, let us assume that R and t (a 3x3 rotation matrix and 3x1 translation vector) represent the orientation and translation which are used to transform a point Xw in world space to a point in camera space Xc (no homogeneuos coordinates) :
Xc = R*Xw + t
These are the R and t, which are part of your projection matrix P (which is applicable with homogeneous coordinates) and the result of solvePnP (see Note at the bottom):
P = K[R|t]
The t is the world origin in camera space.
The other way round, to transform a point in camera space to world space, can be easily derived since the inverse of the orthogonal matrix R is R' (R transposed):
Xw = R'*Xc - R't
As you can see R' (3x3 matrix) and - R't (3x1 vector) are now the orientation matrix respectively translation vector to transform from camera to world space (more precise - R't is the camera origin/center in world space often referred to as C).
Note: rvec is in the form of rotation vectors, as you may know Rodrigues() is used to switch between the two representations.

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Resources