What is the difference between solvePnP and calibrateCamera in opencv?

calibrateCamera() provides rvec, tvec, distCoeff and cameraMatrix whereas solvePnP() takes cameraMatrix, distCoeff as input and provides rvec, tvec as output. What is the difference between these two functions?

The function estimates the following parameters of a monocular camera from several views of a calibration pattern. The geometry of this pattern is usually known (i.e. it can be a chessboard):
The linear intrinsic parameters: the focal lengths in terms of pixels (these are basically scale factors), the principal point which would be ideally in the center of the image, and sometimes a skew coefficient between the x and the y axis (but this is often zero).
The non-linear intrinsic parameters: the previously mentioned parameters are forming the linear camera matrix, but there are also some non-linear parameters in the tranformation from the 3D camera to the 2D image plane, i.e. the lens distortion.
The extrinsic parameters: the tranformation matrix between the 3D world and 3D camera coordinate systems.
The estimation of the above mentioned parameters is usually based on 2D-3D correspondences. The algorithm detects some 2D points in the image (i.e. chessboard) for what the corresponding 3D object points are specified (known 3D geometry). It performs the following steps in the simplest case (can vary on the flags of cv::calibrateCamera(..., int flags, ...)):
Computes the linear intrinsic parameters and considers the non-linear ones to zero.
Estimates the initial camera pose (extrinsics) in function of the approximated intrinsics. This is done using cv::solvePnP(...).
Performs the Levenberg-Marquardt optimization algorithm to minimize the re-projection error between the detected 2D image points and 2D projections of the 3D object points. This is done using cv::projectPoints(...).
At this point, I also answered implicitly the role of cv::solvePnP(...) as this is the part of cv::calibrateCamera(...).
Once you have the intrinsics of a camera, you can assume that these will never change (except you change the optics or zooming). On the other hand the extrinsics can be changed, i.e. you can rotate the camera or put it to another location. You should see that the scenario of changing an object's pose to the camera is very similar in this case. And this is what the cv::solvePnP(...) is used for.
The function estimates the object pose given:
A set of 3D object points in a model coordinate system (can be the 3D world as well),
Their 2D projections on the image plane,
The linear and non-linear intrinsic parameters.
The output of cv::solvePnP(...) is given as a rotation vector (rvec) together with a translation vector (tvec) that bring the 3D object points from the model coordinate system to the 3D camera coordinate system.

calibrateCamera (doc) estimates intrinsics coefficients (i.e. camera matrix and distortion coefficients) for a given camera. This function requires you to provide as input N sets of 2D-3D correspondences, associated to N images taken with the same camera from varying viewpoints (typically N=30, see this tutorial on this topic). The function returns the camera matrix and distortion coefficients for the considered camera. Although those are usually not used, the extrinsics parameters (i.e. position and orientation) are also estimated, hence the function returns one pair of rvec and tvec for each of the N input images.
solvePnP (doc) estimates extrinsics parameters for a given camera image. This function requires you to provide a set of 2D-3D correspondences, associated to a single image taken with a camera with known intrinsics parameters. The function returns a single pair of rvec and tvec, corresponding to the input image.

calibrateCamera() provides rvec, tvec, distCoeff, cameraMatrix ---- distCoeffs are related to distortion of the image and cameraMatrix provides the center of image(Cx and Cy) and focal length (Fx and Fy) (projection center). These are called intrinsic parameters. Unless you change the aperture/focus of the camera they will remain the same. [it also provides rvec and tvec, I don't know yet now what can be any possible use of it. These are the position of the camera in the real world. rvec and tvec are also known as extrinsic parameters]
solvePnP() takes cameraMatrix, distCoeff as input and provides rvec, tvec --- Using the Cx, Cy, Fx, Fy it can estimate the current position of the camera i.e. the extrinsic parameters.
In other words, first use calibrateCamera() to obtain the CameraMatrix and distCoeff. Use them in solvePNP() and it will tell you the rotation (rvec) and translation (tvec) of the camera as you move the camera with respect to your real world object (with some marker as you can presume).


extrinsic matrix computation with opencv

I am using opencv to calibrate my webcam. So, what I have done is fixed my webcam to a rig, so that it stays static and I have used a chessboard calibration pattern and moved it in front of the camera and used the detected points to compute the calibration. So, this is as we can find in many opencv examples (https://docs.opencv.org/3.1.0/dc/dbb/tutorial_py_calibration.html)
Now, this gives me the camera intrinsic matrix and a rotation and translation component for mapping each of these chessboard views from the chessboard space to world space.
However, what I am interested in is the global extrinsic matrix i.e. once I have removed the checkerboard, I want to be able to specify a point in the image scene i.e. x, y and its height and it gives me the position in the world space. As far as I understand, I need both the intrinsic and extrinsic matrix for this. How should one proceed to compute the extrinsic matrix from here? Can I use the measurements that I have already gathered from the chessboard calibration step to compute the extrinsic matrix as well?
Let me place some context. Consider the following picture, (from https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html):
The camera has "attached" a rigid reference frame (Xc,Yc,Zc). The intrinsic calibration that you successfully performed allows you to convert a point (Xc,Yc,Zc) into its projection on the image (u,v), and a point (u,v) in the image to a ray in (Xc,Yc,Zc) (you can only get it up to a scaling factor).
In practice, you want to place the camera in an external "world" reference frame, let's call it (X,Y,Z). Then there is a rigid transformation, represented by a rotation matrix, R, and a translation vector T, such that:
|Xc| |X|
|Yc|= R |Y| + T
|Zc| |Z|
That's the extrinsic calibration (which can be written also as a 4x4 matrix, that's what you call the extrinsic matrix).
Now, the answer. To obtain R and T, you can do the following:
Fix your world reference frame, for example the ground can be the (x,y) plane, and choose an origin for it.
Set some points with known coordinates in this reference frame, for example, points in a square grid in the floor.
Take a picture and get the corresponding 2D image coordinates.
Use solvePnP to obtain the rotation and translation, with the following parameters:
objectPoints: the 3D points in the world reference frame.
imagePoints: the corresponding 2D points in the image in the same order as objectPoints.
cameraMatris: the intrinsic matrix you already have.
distCoeffs: the distortion coefficients you already have.
rvec, tvec: these will be the outputs.
useExtrinsicGuess: false
flags: you can use CV_ITERATIVE
Finally, get R from rvec with the Rodrigues function.
You will need at least 3 non-collinear points with corresponding 3D-2D coordinates for solvePnP to work (link), but more is better. To have good quality points, you could print a big chessboard pattern, put it flat in the floor, and use it as a grid. What's important is that the pattern is not too small in the image (the larger, the more stable your calibration will be).
And, very important: for the intrinsic calibration, you used a chess pattern with squares of a certain size, but you told the algorithm (which does kind of solvePnPs for each pattern), that the size of each square is 1. This is not explicit, but is done in line 10 of the sample code, where the grid is built with coordinates 0,1,2,...:
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
And the scale of the world for the extrinsic calibration must match this, so you have several possibilities:
Use the same scale, for example by using the same grid or by measuring the coordinates of your "world" plane in the same scale. In this case, you "world" won't be at the right scale.
Recommended: redo the intrinsic calibration with the right scale, something like:
objp[:,:2] = (size_of_a_square*np.mgrid[0:7,0:6]).T.reshape(-1,2)
Where size_of_a_square is the real size of a square.
(Haven't done this, but is theoretically possible, do it if you can't do 2) Reuse the intrinsic calibration by scaling fx and fy. This is possible because the camera sees everything up to a scale factor, and the declared size of a square only changes fx and fy (and the T in the pose for each square, but that's another story). If the actual size of a square is L, then replace fx and fy Lfx and Lfy before calling solvePnP.

Difference between undistortPoints() and projectPoints() in OpenCV

From my understanding, undistortPoints takes a set of points on a distorted image, and calculates where their coordinates would be on an undistorted version of the same image. Likewise, projectPoints maps a set of object coordinates to their corresponding image coordinates.
However, I am unsure if projectPoints maps the object coordinates to a set of image points on the distorted image (ie. the original image) or one that has been undistorted (straight lines)?
Furthermore, the OpenCV documentation for undistortPoints states that 'the function performs a reverse transformation to projectPoints()'. Could you please explain how this is so?
Quote from the 3.2 documentation for projectPoints():
Projects 3D points to an image plane.
The function computes
projections of 3D points to the image plane given intrinsic and
extrinsic camera parameters.
You have the parameter distCoeffs:
If the vector is empty, the zero distortion coefficients are assumed.
With no distorsion the equation is:
With K the intrinsic matrix and [R | t] the extrinsic matrix or the transformation that transforms a point in the object or world frame to the camera frame.
For undistortPoints(), you have the parameter R:
Rectification transformation in the object space (3x3 matrix). R1 or R2 computed by cv::stereoRectify can be passed here. If the matrix is empty, the identity transformation is used.
The reverse transformation is the operation where you compute for a 2D image point ([u, v]) the corresponding 3D point in the normalized camera frame ([x, y, z=1]) using the intrinsic parameters.
With the extrinsic matrix, you can get the point in the camera frame:
The normalized camera frame is obtained by dividing by the depth:
Assuming no distortion, the image point is:
And the "reverse transformation" assuming no distortion:

Is there any opencv function to calculate reprojected points?

What is the procedure to calculate reprojected points, reprojected errors and mean reprojection error from the given world points (Original coordinates), intrinsic matrix, rotation matrices and translation vector?
Is there any inbuilt opencv function for that or we should calculate manuallay?
If we have to calculate manually, what is the best way to get reprojected points?
projectPoints projects 3D points to an image plane.
calibrateCamera returns the final re-projection error. calibrateCamera finds the camera intrinsic and extrinsic parameters from several views of a calibration pattern.
The function estimates the intrinsic camera parameters and extrinsic
parameters for each of the views. The algorithm is based on
[Zhang2000]1 and [BouguetMCT]2. The coordinates of 3D object points and
their corresponding 2D projections in each view must be specified.
That may be achieved by using an object with a known geometry and
easily detectable feature points. Such an object is called a
calibration rig or calibration pattern, and OpenCV has built-in
support for a chessboard as a calibration rig (see
findChessboardCorners() ).
The algorithm performs the following steps:
Compute the initial intrinsic parameters (the option only available
for planar calibration patterns) or read them from the input
parameters. The distortion coefficients are all set to zeros initially
unless some of CV_CALIB_FIX_K? are specified.
Estimate the initial
camera pose as if the intrinsic parameters have been already known.
This is done using solvePnP().
Run the global Levenberg-Marquardt
optimization algorithm to minimize the reprojection error, that is,
the total sum of squared distances between the observed feature points
imagePoints and the projected (using the current estimates for camera
parameters and the poses) object points objectPoints. See
projectPoints() for details. The function returns the final
re-projection error.
1ZHANG, Zhengyou. A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2000, 22.11: 1330-1334.
2J.Y.Bouguet. MATLAB calibration tool. http://www.vision.caltech.edu/bouguetj/calib_doc/

Are a camera's extrinsic parameters expressed in a world coordinate frame?

This is a question about terminology used in computer graphics and vision. I want to construct a camera projection matrix, using 2D to 3D correspondences. From these correspondences I create a camera object. I am using a class in a library to represent the camera. It takes the following parameters:
// R is the orientation of the camera expressed in a world coordinate frame
// t is the position of the camera expressed in a world coordinate frame
The first part of my question is: are R and t, as defined above, the extrinsic parameters satisfying x=K[R|t]X? Or do they need to be converted (for example, transpose(R) is the extrinsic orientation, and -transpose(R)*t for position).
I am obtaining R and t using openCV's solvePnP function. The function returns R and t as follows:
rvec – Output rotation vector (see Rodrigues() ) that, together with tvec , brings points from the model coordinate system to the camera coordinate system.
tvec – Output translation vector.
The second part of my question is, based on the descriptions above, are the outputs equivalent to my camera's extrinsic parameters, or do they also need to be transformed (as previously defined)?
The camera projection matrix is typically defined as P = K [R|t], where R and t are the extrinsics, and not the camera orientation and location in the world coordinates.
As to what solvePnP returns, you would have to read its documentation, or try it out and see.
slovepnp takes image points(in image coordinate), object points(in model/world coordinate) and camera intrinsic, then output rvec/tvec, with which we could construct model to camera transform. in opengl, usually called this modelview matrix.
computers are difficult to get knowledge of the environment, we usually use easy-detect features like AR marker and calibration board and take it as world coordinate as well as model coordinate if you would like to call it in a opengl way. In these scene, camera's extrinsic do expressed in a world coordinate frame.
you could check the code of chapter 2 and chapter 3(AR things) of the book mastering opencv and get a good knowledge and practical code of these.

Consistency of projecting points onto an undistorted image

I want to project a point in 3D space into 2D image coordinates. I have the calibrated intrinsics and extrinsics of the camera I'm using. I have the camera matrix K and distortion coefficients D. However, I want the projected image coordinates to be of the undistorted image.
From my research, I found two ways to do this.
Use opencv's getOptimalNewCameraMatrix function to compute a new undistorted image's camera matrix K'. Then use this K' in opencv's projectPoints function, with the distortion parameters set to 0, to get the projected point.
Use projectPoints function using the raw camera matrix K, along with the distortion coefficients D in this function and get the projected point.
Should the output of both methods match?
I think that there is something missing in your thought.
Camera matrix K and dist. coefficent D are the parameters for make the undistortion (if your lens is distorting the image like in a fisheye). They are what is called intrinsic camera parameters.
If we change terms from computer vision to computer graphics, those parameters are the one you use for defining the frustum of the view, and, for example, they are used for getting the focal length of the camera.
But they are not enough to do the projection stuff.
For the projection, if you think in a computer graphics term (like opengl, for instance) you need to have the model-view-projection matrix. The model matrix is the matrix that specifies the position of the object in the world. The view matrix specifies the position of the camera, and the projection matrix specify the frustum (focal angle, perspective distortion, etc).
If you want to know how to transform the points of the model from 3d to 2d (or viceversa) you need the projection and the view matrixes (you have the model matrix because you have the 3d points from which you want to start). And in computer vision the view matrix is called estrinsic parameters.
So, you need the estrinsic parameters too, that are the position of the camera in the world. That is, for instance, those parameters are the rvec and tvec that cv:: projectPoints needs.
If you want to compute them, they are exactly the output of cv::solvePnP that do the opposite of what you want to do: from some known 3d points coupled with the known 2d projection on them on the camera screen, this function gives you the estrinsic parameters (from which you can get the view matrix for some opengl-opencv-augmented-reality-whatever application via cv::Rodrigues).
Last note: while the instrinsic parameters are fixed in all the pictures you shoot with a camera (while you don't change the focal length of course), the estrinisc parameters changes every time you move the camera for take a new picture from a different view point (that is: this changes the perspective point of view, so the 3D-2D projection you want to find)
Hope could help!
