I am trying to verify a solution to deprojecting a pixel point (u,v) into a 3D world location (x,y,z) using only the camera's extrinsic rotation and translation in addition to (u,v).
The proposed solution
I have modeled the problem in Unreal, where I have a virtual camera with world position (1077,1133,450) and rotation yaw=90, pitch=345, roll=0 degrees. I have an object of known 3D position (923,2500,0) seen by the 1280x720 camera at pixel location (771,426) or frame center position (131,-66).
The transpose of my rotation matrix is:
[[ 5.91458986e-17 9.65925826e-01 -0.00000000e+00]
[-1.00000000e+00 6.12323400e-17 0.00000000e+00]
[-1.58480958e-17 -2.58819045e-01 9.65925826e-01]]
My Tx_Ty_Tz matrix is:
[[-1094.39396119]
[ 1077. ]
[ -141.42464373]]
My dx_dy_dz matrix is
[[ -63.75110454]
[-131. ]
[ 18.0479828 ]]
And I end up with location (-1593,50,0) as the deprojected world coordinate, which is clearly wrong. Is one of my matrices incorrectly calculated? If not, is the method provided flawed / incomplete?
The proposed solution in the link does not appear to be accurate as the intrinsic camera matrix should be required for deprojection.
Related
I have a camera matrix world of a 3D scene (let's call it A):
[[ 6.86118087e-01 -7.27490186e-01 1.11022302e-16 0.00000000e+00]
[ 3.82193007e-01 3.60457832e-01 8.50881106e-01 0.00000000e+00]
[-6.19007654e-01 -5.83804917e-01 5.25358300e-01 0.00000000e+00]
[-2.89658661e+00 5.19178303e-01 -3.50478367e-02 1.00000000e+00]]
And, I have 1000 2D images of the scene. For each image, I have pose (camera to world) as follows:
[[-9.55421e-01 1.19616e-01 -2.69932e-01 2.65583e+00]
[ 2.95248e-01 3.88339e-01 -8.72939e-01 2.98160e+00]
[ 4.07581e-04 -9.13720e-01 -4.06343e-01 1.36865e+00]
[ 0.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00]]
I want to find the image that is closest to A by comparing pose matrix. How do I compare A with the camera pose of each image?
What is closest?
Just a little background, we asked annotators to write descriptions of an object in a 3D scene. Along with that, we also captured camera parameters, such as matrix world (see A above), center, lookat, and dof, so that we can estimate where in 3D scene were they looking at when they wrote the description. So, now I am trying to create a training set of images (one image for each scene from 1000 available) and would like to find the image that best matches the recorded camera parameters.
Context: I have a big 2mx2m arena on which 4 aruco markers are printed, their position from a corner of the arena is known and is fixed. Now I have another aruco marker on a robot moving over this arena. PS the position known are in 2d.
Problem: I want to find the position of the robot in the arena (wrt to the known corner of arena).I am using python for the same, first detecting markers from the image using DetectMarker() then estimating the pose of markers. The tvec values returned by the pose estimation function gives the position of marker wrt to the camera coordinate system, that works fine if the camera is perpendicular to the arena but when the camera is kept at an angle then there is a large error in the position.
Is my approach right? Consider the camera is calibrated well, what is the source of error?
rvec, tvec, _ = aruco.estimatePoseSingleMarkers(corners, actual_size, mtx, dist)
cv2.imshow('img',img)
index = np.where(ids==0) # getting the known aruco marker
rotation_matrix[:3, :3], _ = cv2.Rodrigues(rvec[index]) # Computing Rotational Matrix
for i in range(3):
rotation_matrix[i][3]= tvec[index][0][i] # Adding Translation Values to it
inverse_rot = np.linalg.inv(rotation_matrix) # Inversing the matrix
for i,j,k in zip(ids,tvec,rvec):
print(i,'POS:',j) # prints id and tvec values
pt[:3] = j.reshape(3,1)
rot_point = np.dot(inverse_rot,pt) # Homogeneous matrix . tvec values
print(rot_point[:3]) # The new position
print(np.sqrt(rot_point[0]**2 + rot_point[1]**2 )) # Distance
rotational_matrix is 4x4 matrix containing rotation, translation, which is used to transfer the coordinate system(from camera system) to one of the markers(known marker on arena, so that marker becomes the origin), and converting other points(tvecs in the camera system) to the marker system.
Homogeneous Coordinate transformation
I have the readings from a gyroscope attached to a camera describing the orientation of the camera in 3D (say with 3 Euler angles).
I take a picture (of say a flat plane) from this pose. After which, I want to transform the image to another image, as though it has been taken with the camera being perpendicular to the plane itself.
How would I do something like this in OpenCV? Can someone point me in the correct direction?
You can checkout how to calculate the rotation matrix using the roll-pitch-yaw angles here: http://planning.cs.uiuc.edu/node102.html
A Transformation matrix is T = [R t; 0 1] (in matlab notation)
Here, you can place the translation as a 3x1 vector in 't' and the calculated rotation matrix in 'R'.
Since a mathematical information is missing, I assume the Z-axis of the image and the camera are parallel. In this case, you have to add a 90° rotation to either the X or the Y axis to get a perpendicular view. This is to take care of orientation.
perspectiveTransform() function should be helpful thereon.
Check out this question for code insights: How to calculate perspective transform for OpenCV from rotation angles?
I am trying to determine camera position in world coordinates, relative to a fiducial position based on fiducial marker found in a scene.
My methodology for determining the viewMatrix is described here:
Determine camera pose?
I have the rotation and translation, [R|t], from the trained marker to the scene image. Given camera calibration training, and thus the camera intrinsic results, I should be able to discern the cameras position in world coordinates based on the perspective & orientation of the marker found in the scene image.
Can anybody direct me to a discussion or example similar to this? I'd like to know my cameras position based on the fiducial marker, and I'm sure that something similar to this has been done before, I'm just not searching the correct keywords.
Appreciate your guidance.
What do you mean under world coordinates? If you mean object coordinates then you should use the inverse transformation of solvepnp's result.
Given a view matrix [R|t], we have that inv([R|t]) = [R'|-R'*t], where R' is the transpose of R. In OpenCV:
cv::Mat rvec, tvec;
cv::solvePnP(objectPoints, imagePoints, intrinsics, distortion, rvec, tvec);
cv::Mat R;
cv::Rodrigues(rvec, rotation);
R = R.t(); // inverse rotation
tvec = -R * tvec; // translation of inverse
// camPose is a 4x4 matrix with the pose of the camera in the object frame
cv::Mat camPose = cv::Mat::eye(4, 4, R.type());
R.copyTo(camPose.rowRange(0, 3).colRange(0, 3)); // copies R into camPose
tvec.copyTo(camPose.rowRange(0, 3).colRange(3, 4)); // copies tvec into camPose
Update #1:
Result of solvePnP
solvePnP estimates the object pose given a set of object points (model coordinates), their corresponding image projections (image coordinates), as well as the camera matrix and the distortion coefficients.
The object pose is given by two vectors, rvec and tvec. rvec is a compact representation of a rotation matrix for the pattern view seen on the image. That is, rvec together with the corresponding tvec brings the fiducial pattern from the model coordinate space (in which object points are specified) to the camera coordinate space.
That is, we are in the camera coordinate space, it moves with the camera, and the camera is always at the origin. The camera axes have the same directions as image axes, so
x-axis is pointing in the right side from the camera,
y-axis is pointing down,
and z-axis is pointing to the direction of camera view
The same would apply to the model coordinate space, so if you specified the origin in upper right corner of the fiducial pattern, then
x-axis is pointing to the right (e.g. along the longer side of your pattern),
y-axis is pointing to the other side (e.g. along the shorter one),
and z-axis is pointing to the ground.
You can specify the world origin as the first point of the object points that is the first object is set to (0, 0, 0) and all other points have z=0 (in case of planar patterns). Then tvec (combined rvec) points to the origin of the world coordinate space in which you placed the fiducial pattern. solvePnP's output has the same units as the object points.
Take a look at to the following: 6dof positional tracking. I think this is very similar as you need.
I know that in the general case, making this conversion is impossible since depth information is lost going from 3d to 2d.
However, I have a fixed camera and I know its camera matrix. I also have a planar calibration pattern of known dimensions - let's say that in world coordinates it has corners (0,0,0) (2,0,0) (2,1,0) (0,1,0). Using opencv I can estimate the pattern's pose, giving the translation and rotation matrices needed to project a point on the object to a pixel in the image.
Now: this 3d to image projection is easy, but how about the other way? If I pick a pixel in the image that I know is part of the calibration pattern, how can I get the corresponding 3d point?
I could iteratively choose some random 3d point on the calibration pattern, project to 2d, and refine the 3d point based on the error. But this seems pretty horrible.
Given that this unknown point has world coordinates something like (x,y,0) -- since it must lie on the z=0 plane -- it seems like there should be some transformation that I can apply, instead of doing the iterative nonsense. My maths isn't very good though - can someone work out this transformation and explain how you derive it?
Here is a closed form solution that I hope can help someone. Using the conventions in the image from your comment above, you can use centered-normalized pixel coordinates (usually after distortion correction) u and v, and extrinsic calibration data, like this:
|Tx| |r11 r21 r31| |-t1|
|Ty| = |r12 r22 r32|.|-t2|
|Tz| |r13 r23 r33| |-t3|
|dx| |r11 r21 r31| |u|
|dy| = |r12 r22 r32|.|v|
|dz| |r13 r23 r33| |1|
With these intermediate values, the coordinates you want are:
X = (-Tz/dz)*dx + Tx
Y = (-Tz/dz)*dy + Ty
Explanation:
The vector [t1, t2, t3]t is the position of the origin of the world coordinate system (the (0,0) of your calibration pattern) with respect to the camera optical center; by reversing signs and inversing the rotation transformation we obtain vector T = [Tx, Ty, Tz]t, which is the position of the camera center in the world reference frame.
Similarly, [u, v, 1]t is the vector in which lies the observed point in the camera reference frame (starting from camera center). By inversing the rotation transformation we obtain vector d = [dx, dy, dz]t, which represents the same direction in world reference frame.
To inverse the rotation transformation we take advantage of the fact that the inverse of a rotation matrix is its transpose (link).
Now we have a line with direction vector d starting from point T, the intersection of this line with plane Z=0 is given by the second set of equations. Note that it would be similarly easy to find the intersection with the X=0 or Y=0 planes or with any plane parallel to them.
Yes, you can. If you have a transformation matrix that maps a point in the 3d world to the image plane, you can just use the inverse of this transformation matrix to map a image plane point to the 3d world point. If you already know that z = 0 for the 3d world point, this will result in one solution for the point. There will be no need to iteratively choose some random 3d point. I had a similar problem where I had a camera mounted on a vehicle with a known position and camera calibration matrix. I needed to know the real world location of a lane marking captured on the image place of the camera.
If you have Z=0 for you points in world coordinates (which should be true for planar calibration pattern), instead of inversing rotation transformation, you can calculate homography for your image from camera and calibration pattern.
When you have homography you can select point on image and then get its location in world coordinates using inverse homography.
This is true as long as the point in world coordinates is on the same plane as the points used for calculating this homography (in this case Z=0)
This approach to this problem was also discussed below this question on SO: Transforming 2D image coordinates to 3D world coordinates with z = 0