Is it possible to use FindExtrinsicCameraParams2 to get the pose matrix instead of using homography decomposition with SURF feature detection ?
Yes it is assuming you have a calibrated camera and have a set of points whose position is known in world space at t = 0 and image space in the current frame. If you know both of those then the call looks like this
FindExtrinsicCameraParams2(objectPoints, imagePoints, cameraMatrix, distCoeffs, rvec, tvec, useExtrinsicGuess=0)
objectPoints are the points in world coordinates of the object you
are looking at at t==0.
imagePoints are the current image points corresponding to those world
coordinates.
cameraMatrix is your camera matrix
distCoeffs are your distortion coefficients (to ignore those just
pass all 0's).
rvec and tvec will be filled by the function so they contain your
current rotation and translation vectors.
Once you have the contents of rvec and tvec you can convert rvec to a rotation matrix using Rodrigues and then combine the two to get your pose matrix.
Related
I have a small confusion regarding the use of the solvePnP function in OpenCV.
I have the matrix for the intrinsic parameters of the camera and I have identified some keypoints in image and I am trying to estimate the extrinsic parameters for the calibration.
The documentation for solvePnP says:
cv2.solvePnP(objectPoints, imagePoints, cameraMatrix,
distCoeffs[, rvec[, tvec[, useExtrinsicGuess[, flags]]]]) → retval, rvec, tvec
I am guesisng my imagePoints parameters are the keypoints I have detected. and these arte specified in pixels as (x1, y1), (x2, y2), (x3, y3).
I am totally confused about the objectPoints. So, the documentation says:
objectPoints – Array of object points in the object coordinate space,
3xN/Nx3 1-channel or 1xN/Nx1 3-channel, where N is the number of points.
vector<Point3f> can be also passed here.
How can I generate these object points from my image points? What does it mean when they say the object coordinate space in here?
The cv2.solvePnP() method is generally used in pose estimation, or in other words, it can be used to estimate the orientation of a 3D object in a 2D image. So for this you need to tag some key-points in the 3D model of the object(objectPoints) and also detect those key-points in the 2D image(imagePoints).
You can consult this answer, to get the application of this technique for face pose estimation.
I have a camera facing the equivalent of a chessboard. I know the world 3d location of the points as well as the 2d location of the corresponding projected points on the camera image. All the world points belong to the same plane. I use solvePnP:
Matx33d camMat;
Matx41d distCoeffs;
Matx31d rvec;
Matx31d tvec;
std::vector<Point3f> objPoints;
std::vector<Point2f> imgPoints;
solvePnP(objPoints, imgPoints, camMat, distCoeffs, rvec, tvec);
I can then go from the 3d world points to the 2d image points with projectPoints:
std::vector<Point2f> projPoints;
projectPoints(objPoints, rvec, tvec, camMat, distCoeffs, projPoints);
projPoints are very close to imgPoints.
How can I do the reverse with a screen point that corresponds to a 3d world point that belongs to the same plane. I know that from a single view, it's not possible to reconstruct the 3d location but here I'm in the same plane so it's really a 2d problem. I can calculate the reverse rotation matrix as well as the reverse translation vector but then how can I proceed?
Matx33d rot;
Rodrigues(rvec, rot);
Matx33d camera_rotation_vector;
Rodrigues(rot.t(), camera_rotation_vector);
Matx31d camera_translation_vector = -rot.t() * tvec;
Suppose you calibrate your camera by objpoints-imgpoints pair. Note first is real world 3-d coordinate of featured points on calibration board, the second one is 2-d pixel location of featured points in each image. So both of them should be the list where it has the number of calibration board images element. After following line of Python code, you will have calibration matrix mtx, each calibration board's rotations rvecs, and its translations tvecs.
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, np.zeros(5,'float32'),flags=cv2.CALIB_USE_INTRINSIC_GUESS )
Now we can find any pixel's 3D coordinate under the assumption. That assumption is we need to define some reference point. Let's assume our reference is 0th (first) calibration board, where its pivot point is at 0,0 where the long axis of the calibration board is x, and the short one is y-axis, also the surface of calibration board shows Z=0 plane. Here is how we can create a projection matrix.
# projection matrix
Lcam=mtx.dot(np.hstack((cv2.Rodrigues(rvecs[0])[0],tvecs[0])))
Now we can define any pixel location and desired Z value. Note since I want to project (100,100) pixel location on the reference calibration board, I set Z=0.
px=100
py=100
Z=0
X=np.linalg.inv(np.hstack((Lcam[:,0:2],np.array([[-1*px],[-1*py],[-1]])))).dot((-Z*Lcam[:,2]-Lcam[:,3]))
Now we have X and Y coordinate of (px,py) pixel, it is X[0], X[1] .
the last element of X is lambda factor. As a result we can say, pixe on (px,py) location drops on X[0],X[1] coordinate on the 0th calibration board's surface.
This question seems to be a duplicate of another Stackoverflow question in which the asker provides nicely the solution. Here is the link: Answer is here: Computing x,y coordinate (3D) from image point
I am trying to determine camera position in world coordinates, relative to a fiducial position based on fiducial marker found in a scene.
My methodology for determining the viewMatrix is described here:
Determine camera pose?
I have the rotation and translation, [R|t], from the trained marker to the scene image. Given camera calibration training, and thus the camera intrinsic results, I should be able to discern the cameras position in world coordinates based on the perspective & orientation of the marker found in the scene image.
Can anybody direct me to a discussion or example similar to this? I'd like to know my cameras position based on the fiducial marker, and I'm sure that something similar to this has been done before, I'm just not searching the correct keywords.
Appreciate your guidance.
What do you mean under world coordinates? If you mean object coordinates then you should use the inverse transformation of solvepnp's result.
Given a view matrix [R|t], we have that inv([R|t]) = [R'|-R'*t], where R' is the transpose of R. In OpenCV:
cv::Mat rvec, tvec;
cv::solvePnP(objectPoints, imagePoints, intrinsics, distortion, rvec, tvec);
cv::Mat R;
cv::Rodrigues(rvec, rotation);
R = R.t(); // inverse rotation
tvec = -R * tvec; // translation of inverse
// camPose is a 4x4 matrix with the pose of the camera in the object frame
cv::Mat camPose = cv::Mat::eye(4, 4, R.type());
R.copyTo(camPose.rowRange(0, 3).colRange(0, 3)); // copies R into camPose
tvec.copyTo(camPose.rowRange(0, 3).colRange(3, 4)); // copies tvec into camPose
Update #1:
Result of solvePnP
solvePnP estimates the object pose given a set of object points (model coordinates), their corresponding image projections (image coordinates), as well as the camera matrix and the distortion coefficients.
The object pose is given by two vectors, rvec and tvec. rvec is a compact representation of a rotation matrix for the pattern view seen on the image. That is, rvec together with the corresponding tvec brings the fiducial pattern from the model coordinate space (in which object points are specified) to the camera coordinate space.
That is, we are in the camera coordinate space, it moves with the camera, and the camera is always at the origin. The camera axes have the same directions as image axes, so
x-axis is pointing in the right side from the camera,
y-axis is pointing down,
and z-axis is pointing to the direction of camera view
The same would apply to the model coordinate space, so if you specified the origin in upper right corner of the fiducial pattern, then
x-axis is pointing to the right (e.g. along the longer side of your pattern),
y-axis is pointing to the other side (e.g. along the shorter one),
and z-axis is pointing to the ground.
You can specify the world origin as the first point of the object points that is the first object is set to (0, 0, 0) and all other points have z=0 (in case of planar patterns). Then tvec (combined rvec) points to the origin of the world coordinate space in which you placed the fiducial pattern. solvePnP's output has the same units as the object points.
Take a look at to the following: 6dof positional tracking. I think this is very similar as you need.
calibrateCamera() provides rvec, tvec, distCoeff and cameraMatrix whereas solvePnP() takes cameraMatrix, distCoeff as input and provides rvec, tvec as output. What is the difference between these two functions?
cv::calibrateCamera(...)
The function estimates the following parameters of a monocular camera from several views of a calibration pattern. The geometry of this pattern is usually known (i.e. it can be a chessboard):
The linear intrinsic parameters: the focal lengths in terms of pixels (these are basically scale factors), the principal point which would be ideally in the center of the image, and sometimes a skew coefficient between the x and the y axis (but this is often zero).
The non-linear intrinsic parameters: the previously mentioned parameters are forming the linear camera matrix, but there are also some non-linear parameters in the tranformation from the 3D camera to the 2D image plane, i.e. the lens distortion.
The extrinsic parameters: the tranformation matrix between the 3D world and 3D camera coordinate systems.
The estimation of the above mentioned parameters is usually based on 2D-3D correspondences. The algorithm detects some 2D points in the image (i.e. chessboard) for what the corresponding 3D object points are specified (known 3D geometry). It performs the following steps in the simplest case (can vary on the flags of cv::calibrateCamera(..., int flags, ...)):
Computes the linear intrinsic parameters and considers the non-linear ones to zero.
Estimates the initial camera pose (extrinsics) in function of the approximated intrinsics. This is done using cv::solvePnP(...).
Performs the Levenberg-Marquardt optimization algorithm to minimize the re-projection error between the detected 2D image points and 2D projections of the 3D object points. This is done using cv::projectPoints(...).
cv::solvePnP(...)
At this point, I also answered implicitly the role of cv::solvePnP(...) as this is the part of cv::calibrateCamera(...).
Once you have the intrinsics of a camera, you can assume that these will never change (except you change the optics or zooming). On the other hand the extrinsics can be changed, i.e. you can rotate the camera or put it to another location. You should see that the scenario of changing an object's pose to the camera is very similar in this case. And this is what the cv::solvePnP(...) is used for.
The function estimates the object pose given:
A set of 3D object points in a model coordinate system (can be the 3D world as well),
Their 2D projections on the image plane,
The linear and non-linear intrinsic parameters.
The output of cv::solvePnP(...) is given as a rotation vector (rvec) together with a translation vector (tvec) that bring the 3D object points from the model coordinate system to the 3D camera coordinate system.
calibrateCamera (doc) estimates intrinsics coefficients (i.e. camera matrix and distortion coefficients) for a given camera. This function requires you to provide as input N sets of 2D-3D correspondences, associated to N images taken with the same camera from varying viewpoints (typically N=30, see this tutorial on this topic). The function returns the camera matrix and distortion coefficients for the considered camera. Although those are usually not used, the extrinsics parameters (i.e. position and orientation) are also estimated, hence the function returns one pair of rvec and tvec for each of the N input images.
solvePnP (doc) estimates extrinsics parameters for a given camera image. This function requires you to provide a set of 2D-3D correspondences, associated to a single image taken with a camera with known intrinsics parameters. The function returns a single pair of rvec and tvec, corresponding to the input image.
calibrateCamera() provides rvec, tvec, distCoeff, cameraMatrix ---- distCoeffs are related to distortion of the image and cameraMatrix provides the center of image(Cx and Cy) and focal length (Fx and Fy) (projection center). These are called intrinsic parameters. Unless you change the aperture/focus of the camera they will remain the same. [it also provides rvec and tvec, I don't know yet now what can be any possible use of it. These are the position of the camera in the real world. rvec and tvec are also known as extrinsic parameters]
solvePnP() takes cameraMatrix, distCoeff as input and provides rvec, tvec --- Using the Cx, Cy, Fx, Fy it can estimate the current position of the camera i.e. the extrinsic parameters.
In other words, first use calibrateCamera() to obtain the CameraMatrix and distCoeff. Use them in solvePNP() and it will tell you the rotation (rvec) and translation (tvec) of the camera as you move the camera with respect to your real world object (with some marker as you can presume).
I have a rectangular target of known dimensions and location on a wall, and a mobile camera on a robot. As the robot is driving around the room, I need to locate the target and compute the location of the camera and its pose. As a further twist, the camera's elevation and azimuth can be changed using servos. I am able to locate the target using OpenCV, but I am still fuzzy on calculating the camera's position (actually, I've gotten a flat spot on my forehead from banging my head against a wall for the last week). Here is what I am doing:
Read in previously computed camera intrinsics file
Get the pixel coordinates of the 4 points of the target rectangle from the contour
Call solvePnP with the world coordinates of the rectangle, the pixel coordinates, the camera matrix and the distortion matrix
Call projectPoints with the rotation and translation vectors
???
I have read the OpenCV book, but I guess I'm just missing something on how to use the projected points, rotation and translation vectors to compute the world coordinates of the camera and its pose (I'm not a math wiz) :-(
2013-04-02
Following the advice from "morynicz", I have written this simple standalone program.
#include <Windows.h>
#include "opencv\cv.h"
using namespace cv;
int main (int argc, char** argv)
{
const char *calibration_filename = argc >= 2 ? argv [1] : "M1011_camera.xml";
FileStorage camera_data (calibration_filename, FileStorage::READ);
Mat camera_intrinsics, distortion;
vector<Point3d> world_coords;
vector<Point2d> pixel_coords;
Mat rotation_vector, translation_vector, rotation_matrix, inverted_rotation_matrix, cw_translate;
Mat cw_transform = cv::Mat::eye (4, 4, CV_64FC1);
// Read camera data
camera_data ["camera_matrix"] >> camera_intrinsics;
camera_data ["distortion_coefficients"] >> distortion;
camera_data.release ();
// Target rectangle coordinates in feet
world_coords.push_back (Point3d (10.91666666666667, 10.01041666666667, 0));
world_coords.push_back (Point3d (10.91666666666667, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 8.34375, 0));
world_coords.push_back (Point3d (16.08333333333334, 10.01041666666667, 0));
// Coordinates of rectangle in camera
pixel_coords.push_back (Point2d (284, 204));
pixel_coords.push_back (Point2d (286, 249));
pixel_coords.push_back (Point2d (421, 259));
pixel_coords.push_back (Point2d (416, 216));
// Get vectors for world->camera transform
solvePnP (world_coords, pixel_coords, camera_intrinsics, distortion, rotation_vector, translation_vector, false, 0);
dump_matrix (rotation_vector, String ("Rotation vector"));
dump_matrix (translation_vector, String ("Translation vector"));
// We need inverse of the world->camera transform (camera->world) to calculate
// the camera's location
Rodrigues (rotation_vector, rotation_matrix);
Rodrigues (rotation_matrix.t (), camera_rotation_vector);
Mat t = translation_vector.t ();
camera_translation_vector = -camera_rotation_vector * t;
printf ("Camera position %f, %f, %f\n", camera_translation_vector.at<double>(0), camera_translation_vector.at<double>(1), camera_translation_vector.at<double>(2));
printf ("Camera pose %f, %f, %f\n", camera_rotation_vector.at<double>(0), camera_rotation_vector.at<double>(1), camera_rotation_vector.at<double>(2));
}
The pixel coordinates I used in my test are from a real image that was taken about 27 feet left of the target rectangle (which is 62 inches wide and 20 inches high), at about a 45 degree angle. The output is not what I'm expecting. What am I doing wrong?
Rotation vector
2.7005
0.0328
0.4590
Translation vector
-10.4774
8.1194
13.9423
Camera position -28.293855, 21.926176, 37.650714
Camera pose -2.700470, -0.032770, -0.459009
Will it be a problem if my world coordinates have the Y axis inverted from that of OpenCV's screen Y axis? (the origin of my coordinate system is on the floor to the left of the target, while OpenCV's orgin is the top left of the screen).
What units is the pose in?
You get the translation and rotation vectors from solvePnP, which are telling where is the object in camera's coordinates. You need to get an inverse transform.
The transform camera -> object can be written as a matrix [R T;0 1] for homogeneous coordinates. The inverse of this matrix would be, using it's special properties, [R^t -R^t*T;0 1] where R^t is R transposed. You can get R matrix from Rodrigues transform. This way You get the translation vector and rotation matrix for transformation object->camera coordiantes.
If You know where the object lays in the world coordinates You can use the world->object transform * object->camera transform matrix to extract cameras translation and pose.
The pose is described either by single vector or by the R matrix, You surely will find it in Your book. If it's "Learning OpenCV" You will find it on pages 401 - 402 :)
Looking at Your code, You need to do something like this
cv::Mat R;
cv::Rodrigues(rotation_vector, R);
cv::Mat cameraRotationVector;
cv::Rodrigues(R.t(),cameraRotationVector);
cv::Mat cameraTranslationVector = -R.t()*translation_vector;
cameraTranslationVector contains camera coordinates. cameraRotationVector contains camera pose.
It took me forever to understand it, but the pose meaning is the rotation over each axes - x,y,z.
It is in radians. The values are between Pie to minus Pie (-3.14 - 3.14)
Edit:
I've might been mistaken. I read that the pose is the vector which indicates the direction of the camera, and the length of the vector indicates how much to rotate the camera around that vector.