I have a small confusion regarding the use of the solvePnP function in OpenCV.
I have the matrix for the intrinsic parameters of the camera and I have identified some keypoints in image and I am trying to estimate the extrinsic parameters for the calibration.
The documentation for solvePnP says:
cv2.solvePnP(objectPoints, imagePoints, cameraMatrix,
distCoeffs[, rvec[, tvec[, useExtrinsicGuess[, flags]]]]) → retval, rvec, tvec
I am guesisng my imagePoints parameters are the keypoints I have detected. and these arte specified in pixels as (x1, y1), (x2, y2), (x3, y3).
I am totally confused about the objectPoints. So, the documentation says:
objectPoints – Array of object points in the object coordinate space,
3xN/Nx3 1-channel or 1xN/Nx1 3-channel, where N is the number of points.
vector<Point3f> can be also passed here.
How can I generate these object points from my image points? What does it mean when they say the object coordinate space in here?
The cv2.solvePnP() method is generally used in pose estimation, or in other words, it can be used to estimate the orientation of a 3D object in a 2D image. So for this you need to tag some key-points in the 3D model of the object(objectPoints) and also detect those key-points in the 2D image(imagePoints).
You can consult this answer, to get the application of this technique for face pose estimation.
Related
From my understanding, undistortPoints takes a set of points on a distorted image, and calculates where their coordinates would be on an undistorted version of the same image. Likewise, projectPoints maps a set of object coordinates to their corresponding image coordinates.
However, I am unsure if projectPoints maps the object coordinates to a set of image points on the distorted image (ie. the original image) or one that has been undistorted (straight lines)?
Furthermore, the OpenCV documentation for undistortPoints states that 'the function performs a reverse transformation to projectPoints()'. Could you please explain how this is so?
Quote from the 3.2 documentation for projectPoints():
Projects 3D points to an image plane.
The function computes
projections of 3D points to the image plane given intrinsic and
extrinsic camera parameters.
You have the parameter distCoeffs:
If the vector is empty, the zero distortion coefficients are assumed.
With no distorsion the equation is:
With K the intrinsic matrix and [R | t] the extrinsic matrix or the transformation that transforms a point in the object or world frame to the camera frame.
For undistortPoints(), you have the parameter R:
Rectification transformation in the object space (3x3 matrix). R1 or R2 computed by cv::stereoRectify can be passed here. If the matrix is empty, the identity transformation is used.
The reverse transformation is the operation where you compute for a 2D image point ([u, v]) the corresponding 3D point in the normalized camera frame ([x, y, z=1]) using the intrinsic parameters.
With the extrinsic matrix, you can get the point in the camera frame:
The normalized camera frame is obtained by dividing by the depth:
Assuming no distortion, the image point is:
And the "reverse transformation" assuming no distortion:
I have a camera facing the equivalent of a chessboard. I know the world 3d location of the points as well as the 2d location of the corresponding projected points on the camera image. All the world points belong to the same plane. I use solvePnP:
Matx33d camMat;
Matx41d distCoeffs;
Matx31d rvec;
Matx31d tvec;
std::vector<Point3f> objPoints;
std::vector<Point2f> imgPoints;
solvePnP(objPoints, imgPoints, camMat, distCoeffs, rvec, tvec);
I can then go from the 3d world points to the 2d image points with projectPoints:
std::vector<Point2f> projPoints;
projectPoints(objPoints, rvec, tvec, camMat, distCoeffs, projPoints);
projPoints are very close to imgPoints.
How can I do the reverse with a screen point that corresponds to a 3d world point that belongs to the same plane. I know that from a single view, it's not possible to reconstruct the 3d location but here I'm in the same plane so it's really a 2d problem. I can calculate the reverse rotation matrix as well as the reverse translation vector but then how can I proceed?
Matx33d rot;
Rodrigues(rvec, rot);
Matx33d camera_rotation_vector;
Rodrigues(rot.t(), camera_rotation_vector);
Matx31d camera_translation_vector = -rot.t() * tvec;
Suppose you calibrate your camera by objpoints-imgpoints pair. Note first is real world 3-d coordinate of featured points on calibration board, the second one is 2-d pixel location of featured points in each image. So both of them should be the list where it has the number of calibration board images element. After following line of Python code, you will have calibration matrix mtx, each calibration board's rotations rvecs, and its translations tvecs.
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, np.zeros(5,'float32'),flags=cv2.CALIB_USE_INTRINSIC_GUESS )
Now we can find any pixel's 3D coordinate under the assumption. That assumption is we need to define some reference point. Let's assume our reference is 0th (first) calibration board, where its pivot point is at 0,0 where the long axis of the calibration board is x, and the short one is y-axis, also the surface of calibration board shows Z=0 plane. Here is how we can create a projection matrix.
# projection matrix
Lcam=mtx.dot(np.hstack((cv2.Rodrigues(rvecs[0])[0],tvecs[0])))
Now we can define any pixel location and desired Z value. Note since I want to project (100,100) pixel location on the reference calibration board, I set Z=0.
px=100
py=100
Z=0
X=np.linalg.inv(np.hstack((Lcam[:,0:2],np.array([[-1*px],[-1*py],[-1]])))).dot((-Z*Lcam[:,2]-Lcam[:,3]))
Now we have X and Y coordinate of (px,py) pixel, it is X[0], X[1] .
the last element of X is lambda factor. As a result we can say, pixe on (px,py) location drops on X[0],X[1] coordinate on the 0th calibration board's surface.
This question seems to be a duplicate of another Stackoverflow question in which the asker provides nicely the solution. Here is the link: Answer is here: Computing x,y coordinate (3D) from image point
Related posts:
Exact definition of the matrices in OpenCv StereoRectify
What is the camera frame of the rvec and tvec calculated from the cv::calibrateCamera? Is it the original (distorted) camera or the undistorted one? Does the camera coordinate change when the image is undistorted (not rectified)?
What is the R1 from the cv::stereoRectify(). To my understanding, R1 rotate the left camera coordinate (O_c) to a frontal parallel camera coordinate (O_cr) so that the image is rectified (row aligned with the right one). In other word, apply R1 on the 3D points in the O_cr will result in points in the O_c. (or is it the other way around?)
Few posts and the OpenCV book tried to explain it, but I just want to confirm that I understand it clearly. As the explanation of rotating image plane is confusing for me.
Thanks!
I can only reply to 1)
rvec and tvec describe the camera pose expressed in your calibration pattern's coordinate system. For each calibration pose you get an individual rvec and tvec.
Undistortion does not influence the camera position. Pixel positions are modified by using the radial and tangential distortion parameters (distCoeffs) resulting from the camera calibration.
calibrateCamera() provides rvec, tvec, distCoeff and cameraMatrix whereas solvePnP() takes cameraMatrix, distCoeff as input and provides rvec, tvec as output. What is the difference between these two functions?
cv::calibrateCamera(...)
The function estimates the following parameters of a monocular camera from several views of a calibration pattern. The geometry of this pattern is usually known (i.e. it can be a chessboard):
The linear intrinsic parameters: the focal lengths in terms of pixels (these are basically scale factors), the principal point which would be ideally in the center of the image, and sometimes a skew coefficient between the x and the y axis (but this is often zero).
The non-linear intrinsic parameters: the previously mentioned parameters are forming the linear camera matrix, but there are also some non-linear parameters in the tranformation from the 3D camera to the 2D image plane, i.e. the lens distortion.
The extrinsic parameters: the tranformation matrix between the 3D world and 3D camera coordinate systems.
The estimation of the above mentioned parameters is usually based on 2D-3D correspondences. The algorithm detects some 2D points in the image (i.e. chessboard) for what the corresponding 3D object points are specified (known 3D geometry). It performs the following steps in the simplest case (can vary on the flags of cv::calibrateCamera(..., int flags, ...)):
Computes the linear intrinsic parameters and considers the non-linear ones to zero.
Estimates the initial camera pose (extrinsics) in function of the approximated intrinsics. This is done using cv::solvePnP(...).
Performs the Levenberg-Marquardt optimization algorithm to minimize the re-projection error between the detected 2D image points and 2D projections of the 3D object points. This is done using cv::projectPoints(...).
cv::solvePnP(...)
At this point, I also answered implicitly the role of cv::solvePnP(...) as this is the part of cv::calibrateCamera(...).
Once you have the intrinsics of a camera, you can assume that these will never change (except you change the optics or zooming). On the other hand the extrinsics can be changed, i.e. you can rotate the camera or put it to another location. You should see that the scenario of changing an object's pose to the camera is very similar in this case. And this is what the cv::solvePnP(...) is used for.
The function estimates the object pose given:
A set of 3D object points in a model coordinate system (can be the 3D world as well),
Their 2D projections on the image plane,
The linear and non-linear intrinsic parameters.
The output of cv::solvePnP(...) is given as a rotation vector (rvec) together with a translation vector (tvec) that bring the 3D object points from the model coordinate system to the 3D camera coordinate system.
calibrateCamera (doc) estimates intrinsics coefficients (i.e. camera matrix and distortion coefficients) for a given camera. This function requires you to provide as input N sets of 2D-3D correspondences, associated to N images taken with the same camera from varying viewpoints (typically N=30, see this tutorial on this topic). The function returns the camera matrix and distortion coefficients for the considered camera. Although those are usually not used, the extrinsics parameters (i.e. position and orientation) are also estimated, hence the function returns one pair of rvec and tvec for each of the N input images.
solvePnP (doc) estimates extrinsics parameters for a given camera image. This function requires you to provide a set of 2D-3D correspondences, associated to a single image taken with a camera with known intrinsics parameters. The function returns a single pair of rvec and tvec, corresponding to the input image.
calibrateCamera() provides rvec, tvec, distCoeff, cameraMatrix ---- distCoeffs are related to distortion of the image and cameraMatrix provides the center of image(Cx and Cy) and focal length (Fx and Fy) (projection center). These are called intrinsic parameters. Unless you change the aperture/focus of the camera they will remain the same. [it also provides rvec and tvec, I don't know yet now what can be any possible use of it. These are the position of the camera in the real world. rvec and tvec are also known as extrinsic parameters]
solvePnP() takes cameraMatrix, distCoeff as input and provides rvec, tvec --- Using the Cx, Cy, Fx, Fy it can estimate the current position of the camera i.e. the extrinsic parameters.
In other words, first use calibrateCamera() to obtain the CameraMatrix and distCoeff. Use them in solvePNP() and it will tell you the rotation (rvec) and translation (tvec) of the camera as you move the camera with respect to your real world object (with some marker as you can presume).
Is it possible to use FindExtrinsicCameraParams2 to get the pose matrix instead of using homography decomposition with SURF feature detection ?
Yes it is assuming you have a calibrated camera and have a set of points whose position is known in world space at t = 0 and image space in the current frame. If you know both of those then the call looks like this
FindExtrinsicCameraParams2(objectPoints, imagePoints, cameraMatrix, distCoeffs, rvec, tvec, useExtrinsicGuess=0)
objectPoints are the points in world coordinates of the object you
are looking at at t==0.
imagePoints are the current image points corresponding to those world
coordinates.
cameraMatrix is your camera matrix
distCoeffs are your distortion coefficients (to ignore those just
pass all 0's).
rvec and tvec will be filled by the function so they contain your
current rotation and translation vectors.
Once you have the contents of rvec and tvec you can convert rvec to a rotation matrix using Rodrigues and then combine the two to get your pose matrix.