Unable to understand Opencv inbuilt calibrateCamera function - opencv

I am calibrating camera using opencv in built function calibrateCamera.
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
print("translation vector size",len(tvecs))
print("rotation vector size",len(rvecs))
print("translation \n",tvecs)
print("rotation \n",rvecs)
Output:
translation vector size 8
rotation vector size 8
translation
[array([[-2.89545711],
[ 0.53309405],
[16.90937607]]), array([[ 2.5887548 ],
[ 4.28267707],
[13.76961517]]), array([[-3.3813951 ],
[ 0.46023276],
[11.62316805]]), array([[-3.94407341],
[ 2.24712782],
[12.75758635]]), array([[-2.46697627],
[-3.45827811],
[12.90925656]]), array([[ 2.26913044],
[-3.25178618],
[15.65704473]]), array([[-3.65842398],
[-4.35145288],
[17.28001749]]), array([[-1.53432042],
[-4.34836431],
[14.06280739]])]
rotation
[array([[-0.08450996],
[ 0.35247622],
[-1.54211812]]), array([[-0.23013064],
[ 1.02133593],
[-2.79358726]]), array([[-0.34782976],
[-0.06411541],
[-1.20030736]]), array([[-0.27641699],
[ 0.10465832],
[-1.56231228]]), array([[-0.47298366],
[ 0.09331131],
[-0.22505762]]), array([[0.068391 ],
[0.44710268],
[0.10818745]]), array([[-0.09848595],
[ 0.32272789],
[ 0.31561383]]), array([[-0.35190574],
[ 0.24381052],
[ 0.2106984 ]])]
The obtained translation and rotation vectors are consist of eight 3*1 array objects. I expect translation and rotation vector should be of size 3*3 and 3*1 respectively. Please let me know how these values are relevant with translation and rotation matrix. Also suggest me how can I derive translation and rotation matrices from these obtained vectors.
Thanks !

The eight sets of arrays are the eight images you feed in.
tvecs and rvecs you get from calibrateCamera() are vectors. If you want the matrix form, you have to use Rodrigues().
3x1 translation vector is the one you want.
3x3 rotation matrix can be obtained by cv2.Rodrigues().
for rvec in rvecs
R_matrix, _ = cv2.Rodrigues(rvecs)
Also, if you want to concatenate the [R t] matrix, try this:
Rt_matirx = np.concatenate((R_matrix, tvec), axis=1)
See opencv document for more information.

Related

Orthographic Projection in OpenCV for Projecting 3D Points

I have the following inputs:
A color image of size 480 x 848 pixels
An aligned depth image of size 480 x 848 pixels
Camera intrinsic parameters
A transformation from the camera to my frame located at the top
Consider the camera looking into an object from an angle. Furthermore, assume that we have defined a frame at the top of this object. I want to transform the color and depth image from the camera to this frame. As if the camera is mounted at this frame.
The 3D points (a point cloud having x, y, and z values without color) can be obtained using depth image and camera parameters. I want to transform these 3D points (with color) into the top frame. Because these 3D points are actual points in a 3D space, I believe this is just an orthographic projection.
Sample Code
In [1]: import numpy as np
In [2]: import cv2 as cv
In [3]: cv.__version__
Out[3]: '4.2.0'
In [4]: image = cv.imread("image.png")
In [5]: image.shape, image.dtype
Out[5]: ((480, 848, 3), dtype('uint8'))
In [6]: depth = cv.imread("depth.png", cv.CV_16UC1)
In [7]: depth.shape, depth.dtype
Out[7]: ((480, 848), dtype('uint16'))
In [8]: mask = np.ones_like(depth) * 255
In [9]: mask = mask.astype(np.uint8)
In [10]: mask.shape, mask.dtype
Out[10]: ((480, 848), dtype('uint8'))
In [11]: # define transformation from camera to my frame located at top
In [12]: Rt = np.array([[ 1. , -0. , 0. , 0. ],
...: [ 0. , 0.89867918, 0.43860659, -0.191 ],
...: [-0. , -0.43860659, 0.89867918, 0.066 ],
...: [ 0. , 0. , 0. , 1. ]])
In [13]: # camera focal lengths and principal point
In [14]: cameraMatrix = np.array([[ 428.12915039, 0, 418.72729492 ],
...: [ 0, 427.6109314, 238.20678711 ],
...: [ 0, 0, 1 ]])
In [15]: # camera distortion parameters (k1, k2, t1, t2, k3)
In [16]: distCoeff = np.array([-0.05380916, 0.0613398, -0.00064336, 0.00040269, -0.01984365])
In [17]: warpedImage, warpedDepth, warpedMask = cv.rgbd.warpFrame(image, depth, mask, Rt, cameraMatrix, distCoeff)
In [18]: cv.imwrite("warpedImage.png", warpedImage)
Out[18]: True
In [19]: cv.imwrite("warpedDepth.png", warpedDepth)
Out[19]: True
Frame Visualization
The camera is located at camera_color_optical_frame and looking at the object at an angle
The top frame named my_frame is situated on the top of the object
The object is kept at workspace frame
Input Images
Color Image
Depth Image
Output Images
Warped Image
Warped Depth
Expected Output
The output image should be similar to the picture taken from the camera at the top position. A sample image is shown below. We know that we can not get precisely this image; nevertheless, the image below is just for reference purposes.
Notice carefully that this image does not contain the red color attached to the object's walls.
Assuming you are fine with blank space due to parts not seen in the image, using another package, and some extra processing time, you can use Open3D to transform(basically, rotate) the RGBD Image by the required amount.
First create a open3d RGBD-image
import open3d as o3d
color_raw = o3d.io.read_image("image.png")
depth_raw = o3d.io.read_image("depth.png")
rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(
color_raw, depth_raw)
Then, convert to PCD, (I tried transforming an RGBD image but didn't work.. so conversion)
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(
rgbd_image,
o3d.camera.PinholeCameraIntrinsic(
o3d.camera.PinholeCameraIntrinsicParameters.PrimeSenseDefault))
# Flip it, otherwise the pointcloud will be upside down
pcd.transform([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])
o3d.visualization.draw_geometries([pcd], zoom=0.5)
Replace the default camera parameters by your own.
Then, you can use a transformation according to the frames
Next, if you have to convert the pcd back to an RGBD image, follow this example.
Secondly, here is a similar unanswered question where the user ends up using perspective transform for 2-D images
PS. new here, suggestions welcome

Transformation Matrix From EstimatePoseSingleMarkers

I try to get transformation matrix from my camera to the mid point of an aruco marker and I use cv2.aruco.estimatePoseSingleMarkers function.
In the description it says :
The returned transformation is the one that transforms points from each marker coordinate system to the camera coordinate system.
I give corners of the marker detection result to the pose estimation function and convert the rotation vector with cv2.Rodriguez to rotation matrix. Already the resulting translation vector is different from the measured translation from the camera coordinates to the marker coordinates which it should be the same in my knowledge.
Regardless of the inference, I initialize a 4x4 transformation matrix and put the rotation matrix and the translation vector from pose estimation to the 4x4 matrix which I assume it gives the transformation from the marker coordinate frame to the camera coordinate frame but when I correct the transformation with another point from the environment it shows the transformation matrix is not correct.
I am sure about the MARKER_SIZE.
marker2camera_r_vec, marker2camera_t_vec, _ = cv2.aruco.estimatePoseSingleMarkers(corners, MARKER_SIZE, camera.mtx, camera.dist)
marker2camera_r_mat = np.array(cv2.Rodrigues(marker2camera_r_vec)[0])
transformation_matrix = np.zeros([4, 4])
transformation_matrix[0:3, 0:3] = rotation_matrix
transformation_matrix[0:3, 3] = translation_vector
transformation_matrix[3, 3] = 1
Example results:
marker2camera_r_mat =
[[-0.96533802 -0.03093691 -0.25916292],
[ 0.07337548 0.92073721 -0.38322189],
[ 0.25047664 -0.38895487 -0.88655263]]
marker2camera_t_vec =
[ 1.45883855 6.98282269 77.73744481]
transformation_matrix =
[[-9.65338018e-01 -3.09369077e-02 -2.59162919e-01 1.45883855e+00],
[ 7.33754824e-02 9.20737214e-01 -3.83221895e-01 6.98282269e+00],
[ 2.50476645e-01 -3.88954869e-01 -8.86552627e-01 7.77374448e+01],
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00]]
Actual point coordinates of the marker from the camera coordinates is =
[-0.086 0.12 0.83]
Can anyone tell me what I do wrong and explain the steps to obtain the transformation matrix from the camera coordinates to the marker coordinates via cv2.aruco.estimatePoseSingleMarkers ?

Inverse Perspective Transform?

I am trying to find the bird's eye image from a given image. I also have the rotations and translations (also intrinsic matrix) required to convert it into the bird's eye plane. My aim is to find an inverse homography matrix(3x3).
rotation_x = np.asarray([[1,0,0,0],
[0,np.cos(R_x),-np.sin(R_x),0],
[0,np.sin(R_x),np.cos(R_x),0],
[0,0,0,1]],np.float32)
translation = np.asarray([[1, 0, 0, 0],
[0, 1, 0, 0 ],
[0, 0, 1, -t_y/(dp_y * np.sin(R_x))],
[0, 0, 0, 1]],np.float32)
intrinsic = np.asarray([[s_x * f / (dp_x ),0, 0, 0],
[0, 1 * f / (dp_y ) ,0, 0 ],
[0,0,1,0]],np.float32)
#The Projection matrix to convert the image coordinates to 3-D domain from (x,y,1) to (x,y,0,1); Not sure if this is the right approach
projection = np.asarray([[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[0, 0, 1]], np.float32)
homography_matrix = intrinsic # translation # rotation # projection
inv = cv2.warpPerspective(source_image, homography_matrix,(w,h),flags = cv2.INTER_CUBIC | cv2.WARP_INVERSE_MAP)
My question is, Is this the right approach, as I can manual set a suitable ty,rx, but not for the one (ty,rx) which is provided.
First premise: your bird's eye view will be correct only for one specific plane in the image, since a homography can only map planes (including the plane at infinity, corresponding to a pure camera rotation).
Second premise: if you can identify a quadrangle in the first image that is the projection of a rectangle in the world, you can directly compute the homography that maps the quad into the rectangle (i.e. the "birds's eye view" of the quad), and warp the image with it, setting the scale so the image warps to a desired size. No need to use the camera intrinsics. Example: you have the image of a building with rectangular windows, and you know the width/height ratio of these windows in the world.
Sometimes you can't find rectangles, but your camera is calibrated, and thus the problem you describe comes into play. Let's do the math. Assume the plane you are observing in the given image is Z=0 in world coordinates. Let K be the 3x3 intrinsic camera matrix and [R, t] the 3x4 matrix representing the camera pose in XYZ world frame, so that if Pc and Pw represent the same 3D point respectively in camera and world coordinates, it is Pc = R*Pw + t = [R, t] * [Pw.T, 1].T, where .T means transposed. Then you can write the camera projection as:
s * p = K * [R, t] * [Pw.T, 1].T
where s is an arbitrary scale factor and p is the pixel that Pw projects onto. But if Pw=[X, Y, Z].T is on the Z=0 plane, the 3rd column of R only multiplies zeros, so we can ignore it. If we then denote with r1 and r2 the first two columns of R, we can rewrite the above equation as:
s * p = K * [r1, r2, t] * [X, Y, 1].T
But K * [r1, r2, t] is a 3x3 matrix that transforms points on a 3D plane to points on the camera plane, so it is a homography.
If the plane is not Z=0, you can repeat the same argument replacing [R, t] with [R, t] * inv([Rp, tp]), where [Rp, tp] is the coordinate transform that maps a frame on the plane, with the plane normal being the Z axis, to the world frame.
Finally, to obtain the bird's eye view, you select a rotation R whose third column (the components of the world's Z axis in camera frame) is opposite to the plane's normal.

Field of view of a GoPro camera

I have calibrated my GoPro Hero 4 Black using Camera calibration toolbox for Matlab and calculated its fields of view and focal length using OpenCV's calibrationMatrixValues(). These, however, differ from GoPro's specifications. Istead of 118.2/69.5 FOVs I get 95.4/63.4 and focal length 2.8mm instead of 17.2mm. Obviously something is wrong.
I suppose the calibration itself is correct since image undistortion seems to be working well.
Can anyone please give me a hint where I made a mistake? I am posting my code below.
Thanks.
Code
cameraMatrix = new Mat(3, 3, 6);
for (int i = 0; i < cameraMatrix.height(); i ++)
for (int j = 0; j < cameraMatrix.width(); j ++) {
cameraMatrix.put(i, j, 0);
}
cameraMatrix.put(0, 0, 582.18394);
cameraMatrix.put(0, 2, 663.50655);
cameraMatrix.put(1, 1, 582.52915);
cameraMatrix.put(1, 2, 378.74541);
cameraMatrix.put(2, 2, 1.);
org.opencv.core.Size size = new org.opencv.core.Size(1280, 720);
//output parameters
double [] fovx = new double[1];
double [] fovy = new double[1];
double [] focLen = new double[1];
double [] aspectRatio = new double[1];
Point ppov = new Point(0, 0);
org.opencv.calib3d.Calib3d.calibrationMatrixValues(cameraMatrix, size,
6.17, 4.55, fovx, fovy, focLen, ppov, aspectRatio);
System.out.println("FoVx: " + fovx[0]);
System.out.println("FoVy: " + fovy[0]);
System.out.println("Focal length: " + focLen[0]);
System.out.println("Principal point of view; x: " + ppov.x + ", y: " + ppov.y);
System.out.println("Aspect ratio: " + aspectRatio[0]);
Results
FoVx: 95.41677635378488
FoVy: 63.43170132212425
Focal length: 2.8063085232812504
Principal point of view; x: 3.198308916796875, y: 2.3934605770833333
Aspect ratio: 1.0005929569269807
GoPro specifications
https://gopro.com/help/articles/Question_Answer/HERO4-Field-of-View-FOV-Information
Edit
Matlab calibration results
Focal Length: fc = [ 582.18394 582.52915 ] ± [ 0.77471 0.78080 ]
Principal point: cc = [ 663.50655 378.74541 ] ± [ 1.40781 1.13965 ]
Skew: alpha_c = [ -0.00028 ] ± [ 0.00056 ] => angle of pixel axes = 90.01599 ± 0.03208 degrees
Distortion: kc = [ -0.25722 0.09022 -0.00060 0.00009 -0.01662 ] ± [ 0.00228 0.00276 0.00020 0.00018 0.00098 ]
Pixel error: err = [ 0.30001 0.28188 ]
One of the images used for calibration
And the undistorted image
You have entered 6.17mm and 4.55mm for the sensor size in OpenCV, which corresponds to an aspect ratio 1.36 whereas as your resolution (1270x720) is 1.76 (approximately 16x9 format).
Did you crop your image before MATLAB calibration?
The pixel size seems to be 1.55µm from this Gopro page (this is by the way astonishingly small!). If pixels are squared, and they should be on this type of commercial cameras, that means your inputs are not coherent. Computed sensor size should be :
[Sensor width, Sensor height] = [1280, 720]*1.55*10^-3 = [1.97, 1.12]
mm
Even if considering the maximal video resolution which is 3840 x 2160, we obtain [5.95, 3.35]mm, still different from your input.
Please see this explanation about equivalent focal length to understand why the actual focal length of the camera is not 17.2 but 17.2*5.95/36 ~ 2.8mm. In that case, compute FOV using the formulas here for instance. You will indeed find values of 93.5°/61.7° (close to your outputs but still not what is written in the specifications because there probably some optical distortion due to the wide angles).
What I do not understand though, is how the focal distance returned can be right whereas sensor size entered is wrong. Could you give more info and/or send an image?
Edits after question updates
On that cameras, with a working resolution of 1280x720, the image is downsampled but not cropped so what I said above about sensor dimensions does not apply. The sensor size to consider is indeed the one used (6.17x4.55) as explained in your first comment.
The FOV is constrained by the calibration matrix inputs (fx, fy, cx, cy) given in pixels and the resolution. You can check it by typing:
2*DEGRES(ATAN(1280/(2*582.18394))) (=95.416776...°)
This FOV value is smaller than what is expected, but by the look of the undistorted image, your MATLAB distortion model is right and the calibration is correct. The barrel distortion due to the wide angle seems well corrected by the the rewarp you applied.
However, MATLAB toolbox uses a pinhole model, which is linear and cannot account for intrinsic parameters such as lens distortion. I assume this from the page :
https://fr.mathworks.com/help/vision/ug/camera-calibration.html
Hence, my best guess is that unless you find a model which fits more accurately the Gopro camera (maybe a wide-angle lens model), MATLAB calibration will return an intrinsic camera matrix corresponding to the "linear" undistorted image and the FOV will indeed be smaller (in the case of barrel distortion). You will have to apply distortion coefficients associated to the calibration to retrieve the actual FOV value.
We can see in the corrected image that side parts of the FOV get rejected out of bounds. If you had warped the image entirely, you would find that some undistorted pixels coordinates exceed [-1280/2;+1280/2] (horizontally, and idem vertically). Then, replacing opencv.core.Size(1280, 720) by the most extreme ranges obtained, you would hopefully retrieve Gopro website values.
In conclusion, I think you can rely on the focal distance value that you obtained if you make measurements in the center of your image, otherwise there is too much distortion and it doesn't apply.

Pose estimation: solvePnP and epipolar geometry do not agree

I have a relative camera pose estimation problem where I am looking at a scene with differently oriented cameras spaced a certain distance apart. Initially, I am computing the essential matrix using the 5 point algorithm and decomposing it to get the R and t of camera 2 w.r.t camera 1.
I thought it would be a good idea to do a check by triangulating the two sets of image points into 3D, and then running solvePnP on the 3D-2D correspondences, but the result I get from solvePnP is way off. I am trying to do this to "refine" my pose as the scale can change from one frame to another. Anyway, In one case, I had a 45 degree rotation between camera 1 and camera 2 along the Z axis, and the epipolar geometry part gave me this answer:
Relative camera rotation is [1.46774, 4.28483, 40.4676]
Translation vector is [-0.778165583410928; -0.6242059242696293; -0.06946429947410336]
solvePnP, on the other hand..
Camera1: rvecs [0.3830144497209735; -0.5153903947692436; -0.001401186630803216]
tvecs [-1777.451836911453; -1097.111339375749; 3807.545406775675]
Euler1 [24.0615, -28.7139, -6.32776]
Camera2: rvecs [1407374883553280; 1337006420426752; 774194163884064.1] (!!)
tvecs[1.249151852575814; -4.060149502748567; -0.06899980661249146]
Euler2 [-122.805, -69.3934, 45.7056]
Something is troublingly off with the rvecs of camera2 and tvec of camera 1. My code involving the point triangulation and solvePnP looks like this:
points1.convertTo(points1, CV_32F);
points2.convertTo(points2, CV_32F);
// Homogenize image points
points1.col(0) = (points1.col(0) - pp.x) / focal;
points2.col(0) = (points2.col(0) - pp.x) / focal;
points1.col(1) = (points1.col(1) - pp.y) / focal;
points2.col(1) = (points2.col(1) - pp.y) / focal;
points1 = points1.t(); points2 = points2.t();
cv::triangulatePoints(P1, P2, points1, points2, points3DH);
cv::Mat points3D;
convertPointsFromHomogeneous(Mat(points3DH.t()).reshape(4, 1), points3D);
cv::solvePnP(points3D, points1.t(), K, noArray(), rvec1, tvec1, 1, CV_ITERATIVE );
cv::solvePnP(points3D, points2.t(), K, noArray(), rvec2, tvec2, 1, CV_ITERATIVE );
And then I am converting the rvecs through Rodrigues to get the Euler angles: but since rvecs and tvecs themselves seem to be wrong, I feel something's wrong with my process. Any pointers would be helpful. Thanks!

Resources