Difference between undistortPoints() and projectPoints() in OpenCV - opencv

From my understanding, undistortPoints takes a set of points on a distorted image, and calculates where their coordinates would be on an undistorted version of the same image. Likewise, projectPoints maps a set of object coordinates to their corresponding image coordinates.
However, I am unsure if projectPoints maps the object coordinates to a set of image points on the distorted image (ie. the original image) or one that has been undistorted (straight lines)?
Furthermore, the OpenCV documentation for undistortPoints states that 'the function performs a reverse transformation to projectPoints()'. Could you please explain how this is so?

Quote from the 3.2 documentation for projectPoints():
Projects 3D points to an image plane.
The function computes
projections of 3D points to the image plane given intrinsic and
extrinsic camera parameters.
You have the parameter distCoeffs:
If the vector is empty, the zero distortion coefficients are assumed.
With no distorsion the equation is:
With K the intrinsic matrix and [R | t] the extrinsic matrix or the transformation that transforms a point in the object or world frame to the camera frame.
For undistortPoints(), you have the parameter R:
Rectification transformation in the object space (3x3 matrix). R1 or R2 computed by cv::stereoRectify can be passed here. If the matrix is empty, the identity transformation is used.
The reverse transformation is the operation where you compute for a 2D image point ([u, v]) the corresponding 3D point in the normalized camera frame ([x, y, z=1]) using the intrinsic parameters.
With the extrinsic matrix, you can get the point in the camera frame:
The normalized camera frame is obtained by dividing by the depth:
Assuming no distortion, the image point is:
And the "reverse transformation" assuming no distortion:

Related

extrinsic matrix computation with opencv

I am using opencv to calibrate my webcam. So, what I have done is fixed my webcam to a rig, so that it stays static and I have used a chessboard calibration pattern and moved it in front of the camera and used the detected points to compute the calibration. So, this is as we can find in many opencv examples (https://docs.opencv.org/3.1.0/dc/dbb/tutorial_py_calibration.html)
Now, this gives me the camera intrinsic matrix and a rotation and translation component for mapping each of these chessboard views from the chessboard space to world space.
However, what I am interested in is the global extrinsic matrix i.e. once I have removed the checkerboard, I want to be able to specify a point in the image scene i.e. x, y and its height and it gives me the position in the world space. As far as I understand, I need both the intrinsic and extrinsic matrix for this. How should one proceed to compute the extrinsic matrix from here? Can I use the measurements that I have already gathered from the chessboard calibration step to compute the extrinsic matrix as well?
Let me place some context. Consider the following picture, (from https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html):
The camera has "attached" a rigid reference frame (Xc,Yc,Zc). The intrinsic calibration that you successfully performed allows you to convert a point (Xc,Yc,Zc) into its projection on the image (u,v), and a point (u,v) in the image to a ray in (Xc,Yc,Zc) (you can only get it up to a scaling factor).
In practice, you want to place the camera in an external "world" reference frame, let's call it (X,Y,Z). Then there is a rigid transformation, represented by a rotation matrix, R, and a translation vector T, such that:
|Xc| |X|
|Yc|= R |Y| + T
|Zc| |Z|
That's the extrinsic calibration (which can be written also as a 4x4 matrix, that's what you call the extrinsic matrix).
Now, the answer. To obtain R and T, you can do the following:
Fix your world reference frame, for example the ground can be the (x,y) plane, and choose an origin for it.
Set some points with known coordinates in this reference frame, for example, points in a square grid in the floor.
Take a picture and get the corresponding 2D image coordinates.
Use solvePnP to obtain the rotation and translation, with the following parameters:
objectPoints: the 3D points in the world reference frame.
imagePoints: the corresponding 2D points in the image in the same order as objectPoints.
cameraMatris: the intrinsic matrix you already have.
distCoeffs: the distortion coefficients you already have.
rvec, tvec: these will be the outputs.
useExtrinsicGuess: false
flags: you can use CV_ITERATIVE
Finally, get R from rvec with the Rodrigues function.
You will need at least 3 non-collinear points with corresponding 3D-2D coordinates for solvePnP to work (link), but more is better. To have good quality points, you could print a big chessboard pattern, put it flat in the floor, and use it as a grid. What's important is that the pattern is not too small in the image (the larger, the more stable your calibration will be).
And, very important: for the intrinsic calibration, you used a chess pattern with squares of a certain size, but you told the algorithm (which does kind of solvePnPs for each pattern), that the size of each square is 1. This is not explicit, but is done in line 10 of the sample code, where the grid is built with coordinates 0,1,2,...:
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
And the scale of the world for the extrinsic calibration must match this, so you have several possibilities:
Use the same scale, for example by using the same grid or by measuring the coordinates of your "world" plane in the same scale. In this case, you "world" won't be at the right scale.
Recommended: redo the intrinsic calibration with the right scale, something like:
objp[:,:2] = (size_of_a_square*np.mgrid[0:7,0:6]).T.reshape(-1,2)
Where size_of_a_square is the real size of a square.
(Haven't done this, but is theoretically possible, do it if you can't do 2) Reuse the intrinsic calibration by scaling fx and fy. This is possible because the camera sees everything up to a scale factor, and the declared size of a square only changes fx and fy (and the T in the pose for each square, but that's another story). If the actual size of a square is L, then replace fx and fy Lfx and Lfy before calling solvePnP.

What is the difference between solvePnP and calibrateCamera in opencv?

calibrateCamera() provides rvec, tvec, distCoeff and cameraMatrix whereas solvePnP() takes cameraMatrix, distCoeff as input and provides rvec, tvec as output. What is the difference between these two functions?
cv::calibrateCamera(...)
The function estimates the following parameters of a monocular camera from several views of a calibration pattern. The geometry of this pattern is usually known (i.e. it can be a chessboard):
The linear intrinsic parameters: the focal lengths in terms of pixels (these are basically scale factors), the principal point which would be ideally in the center of the image, and sometimes a skew coefficient between the x and the y axis (but this is often zero).
The non-linear intrinsic parameters: the previously mentioned parameters are forming the linear camera matrix, but there are also some non-linear parameters in the tranformation from the 3D camera to the 2D image plane, i.e. the lens distortion.
The extrinsic parameters: the tranformation matrix between the 3D world and 3D camera coordinate systems.
The estimation of the above mentioned parameters is usually based on 2D-3D correspondences. The algorithm detects some 2D points in the image (i.e. chessboard) for what the corresponding 3D object points are specified (known 3D geometry). It performs the following steps in the simplest case (can vary on the flags of cv::calibrateCamera(..., int flags, ...)):
Computes the linear intrinsic parameters and considers the non-linear ones to zero.
Estimates the initial camera pose (extrinsics) in function of the approximated intrinsics. This is done using cv::solvePnP(...).
Performs the Levenberg-Marquardt optimization algorithm to minimize the re-projection error between the detected 2D image points and 2D projections of the 3D object points. This is done using cv::projectPoints(...).
cv::solvePnP(...)
At this point, I also answered implicitly the role of cv::solvePnP(...) as this is the part of cv::calibrateCamera(...).
Once you have the intrinsics of a camera, you can assume that these will never change (except you change the optics or zooming). On the other hand the extrinsics can be changed, i.e. you can rotate the camera or put it to another location. You should see that the scenario of changing an object's pose to the camera is very similar in this case. And this is what the cv::solvePnP(...) is used for.
The function estimates the object pose given:
A set of 3D object points in a model coordinate system (can be the 3D world as well),
Their 2D projections on the image plane,
The linear and non-linear intrinsic parameters.
The output of cv::solvePnP(...) is given as a rotation vector (rvec) together with a translation vector (tvec) that bring the 3D object points from the model coordinate system to the 3D camera coordinate system.
calibrateCamera (doc) estimates intrinsics coefficients (i.e. camera matrix and distortion coefficients) for a given camera. This function requires you to provide as input N sets of 2D-3D correspondences, associated to N images taken with the same camera from varying viewpoints (typically N=30, see this tutorial on this topic). The function returns the camera matrix and distortion coefficients for the considered camera. Although those are usually not used, the extrinsics parameters (i.e. position and orientation) are also estimated, hence the function returns one pair of rvec and tvec for each of the N input images.
solvePnP (doc) estimates extrinsics parameters for a given camera image. This function requires you to provide a set of 2D-3D correspondences, associated to a single image taken with a camera with known intrinsics parameters. The function returns a single pair of rvec and tvec, corresponding to the input image.
calibrateCamera() provides rvec, tvec, distCoeff, cameraMatrix ---- distCoeffs are related to distortion of the image and cameraMatrix provides the center of image(Cx and Cy) and focal length (Fx and Fy) (projection center). These are called intrinsic parameters. Unless you change the aperture/focus of the camera they will remain the same. [it also provides rvec and tvec, I don't know yet now what can be any possible use of it. These are the position of the camera in the real world. rvec and tvec are also known as extrinsic parameters]
solvePnP() takes cameraMatrix, distCoeff as input and provides rvec, tvec --- Using the Cx, Cy, Fx, Fy it can estimate the current position of the camera i.e. the extrinsic parameters.
In other words, first use calibrateCamera() to obtain the CameraMatrix and distCoeff. Use them in solvePNP() and it will tell you the rotation (rvec) and translation (tvec) of the camera as you move the camera with respect to your real world object (with some marker as you can presume).

finding the real world coordinates of an image point

I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.

Consistency of projecting points onto an undistorted image

I want to project a point in 3D space into 2D image coordinates. I have the calibrated intrinsics and extrinsics of the camera I'm using. I have the camera matrix K and distortion coefficients D. However, I want the projected image coordinates to be of the undistorted image.
From my research, I found two ways to do this.
Use opencv's getOptimalNewCameraMatrix function to compute a new undistorted image's camera matrix K'. Then use this K' in opencv's projectPoints function, with the distortion parameters set to 0, to get the projected point.
Use projectPoints function using the raw camera matrix K, along with the distortion coefficients D in this function and get the projected point.
Should the output of both methods match?
I think that there is something missing in your thought.
Camera matrix K and dist. coefficent D are the parameters for make the undistortion (if your lens is distorting the image like in a fisheye). They are what is called intrinsic camera parameters.
If we change terms from computer vision to computer graphics, those parameters are the one you use for defining the frustum of the view, and, for example, they are used for getting the focal length of the camera.
But they are not enough to do the projection stuff.
For the projection, if you think in a computer graphics term (like opengl, for instance) you need to have the model-view-projection matrix. The model matrix is the matrix that specifies the position of the object in the world. The view matrix specifies the position of the camera, and the projection matrix specify the frustum (focal angle, perspective distortion, etc).
If you want to know how to transform the points of the model from 3d to 2d (or viceversa) you need the projection and the view matrixes (you have the model matrix because you have the 3d points from which you want to start). And in computer vision the view matrix is called estrinsic parameters.
So, you need the estrinsic parameters too, that are the position of the camera in the world. That is, for instance, those parameters are the rvec and tvec that cv:: projectPoints needs.
If you want to compute them, they are exactly the output of cv::solvePnP that do the opposite of what you want to do: from some known 3d points coupled with the known 2d projection on them on the camera screen, this function gives you the estrinsic parameters (from which you can get the view matrix for some opengl-opencv-augmented-reality-whatever application via cv::Rodrigues).
Last note: while the instrinsic parameters are fixed in all the pictures you shoot with a camera (while you don't change the focal length of course), the estrinisc parameters changes every time you move the camera for take a new picture from a different view point (that is: this changes the perspective point of view, so the 3D-2D projection you want to find)
Hope could help!

Is it possible to obtain fronto-parallel view of image or camera position by 2d-3d points relation?

Is it possible to obtain front-parallel view of image or camera position by 2d-3d points relation using OpenCV?
For this I have intrinsic and extrinsic parameters. I have also 3d coordinates of set of control points (which lies in one plane) on image (relation 2d-3d).
In fact I need location and orientation of camera, but it is not difficult to find it if I can convert image to fronto-parallel view.
If it is not possible to do with OpenCV, are the other libraries which can solve this task?
Solution is based on the formulas in the OpenCV documentation Camera Calibration and 3D Reconstruction
Let's consider numerical form without distortion coefficient (in contrast with matrix form).
We have u and v.
It is easy to calculate x' and y'.
But x and y can not be calculate because we can choose any non-zero z.
Line in 3d corresponds to one point in 2d image.
To solve this we take two points for z=1 and z=2.
Then we find 2 points in 3d space which specify line (x1,y1,z1) and (x2,y2,z2).
Then we can apply R-1 to (x1,y1,z1) and (x2,y2,z2) which results in line determined by two points (X1, Y1, Z1) and (X1, Y1, Z1).
Since our control points lie in one plane (let plane is Z=0 for simplicity) we can find corresponding X and Y point which is a point in 3d.
After applying normalization from mm to pixels we obtain fronto-parallel image.
(If we have input image distorted we should undistort it first)

Resources