OpenCV Camera calibration and SolvePNP translate results - opencv

I am attempting to initially calibrate a sensor using a chessboard.
I make around 50 run, and after I calibrate the camera I proceed in using solvepnp to teach the coordinate system, and since I am using a well defined chessboard to actually learn the real world coordinates.
As input for the SolvePnP I use all corner values with their corresponding real world distances.
My biggest issue is that the translate matrix I calculate from the SolvePvP is a bit strange. It is my understanding that the translate matrix is the actual offset between the camera and the coordinate system, which I define as the the upper part of the chessboard. But I get completely random values, with the Tz gives me a value of around 6000, even though the distance between camera and chessboard is around 1600 mm, and I am not even using depth values as parameters for the solvePnP method anyway.
Any input on what could be off?
Code Sample:
50 x DrawChessboardCorners
Corner result: {X = 1170.45984 Y = 793.002}
{X = 1127.80371 Y = 792.54425}
3d Points: {X = 175 Y = 70, Z = 0}
{X = 140 Y = 70, Z = 0}
total 18 (6x3) results per run for a total of 50 runs.
Afterwards I calibrate the camera:
CalibrateCamera(_objectsList, _pointsList,
new Size(_sensorWrapper.Width, _sensorWrapper.Height),
cameraMatrix, distCoeffs,
CalibType.Default,
new MCvTermCriteria(30, 0.1), out _, out _);
Afterwards, using the cameraMatrix and distCoeffs I use SolverPnP by using the top left, top right, bottom left bottom right corners, with their real world coordinates.
The results i get from the calibration are :
{
"CameraMatrix": [
[
5969.947,
0.0,
959.687256
],
[
0.0,
6809.737,
540.3694
],
[
0.0,
0.0,
1.0
]
],
"DistortionCoefficientsMatrix": [
[
0.141516522,
285.377747,
0.008248664,
0.0280253552,
1.5376302
]
],
"RotationMatrix": [
[
0.9992069,
-0.0270648878,
0.0292078461
],
[
-0.0003847139,
0.726907134,
0.68673563
],
[
-0.0398178138,
-0.6862022,
0.7263202
]
],
"TranslationMatrix": [
[
22.5370159
],
[
-362.535675
],
[
5448.58057
]
],
"SensorName": "BRIO 4K Stream Edition",
"SensorIndex": 0,
"Error": 0.18790249992423425
}

Related

Orthographic Projection in OpenCV for Projecting 3D Points

I have the following inputs:
A color image of size 480 x 848 pixels
An aligned depth image of size 480 x 848 pixels
Camera intrinsic parameters
A transformation from the camera to my frame located at the top
Consider the camera looking into an object from an angle. Furthermore, assume that we have defined a frame at the top of this object. I want to transform the color and depth image from the camera to this frame. As if the camera is mounted at this frame.
The 3D points (a point cloud having x, y, and z values without color) can be obtained using depth image and camera parameters. I want to transform these 3D points (with color) into the top frame. Because these 3D points are actual points in a 3D space, I believe this is just an orthographic projection.
Sample Code
In [1]: import numpy as np
In [2]: import cv2 as cv
In [3]: cv.__version__
Out[3]: '4.2.0'
In [4]: image = cv.imread("image.png")
In [5]: image.shape, image.dtype
Out[5]: ((480, 848, 3), dtype('uint8'))
In [6]: depth = cv.imread("depth.png", cv.CV_16UC1)
In [7]: depth.shape, depth.dtype
Out[7]: ((480, 848), dtype('uint16'))
In [8]: mask = np.ones_like(depth) * 255
In [9]: mask = mask.astype(np.uint8)
In [10]: mask.shape, mask.dtype
Out[10]: ((480, 848), dtype('uint8'))
In [11]: # define transformation from camera to my frame located at top
In [12]: Rt = np.array([[ 1. , -0. , 0. , 0. ],
...: [ 0. , 0.89867918, 0.43860659, -0.191 ],
...: [-0. , -0.43860659, 0.89867918, 0.066 ],
...: [ 0. , 0. , 0. , 1. ]])
In [13]: # camera focal lengths and principal point
In [14]: cameraMatrix = np.array([[ 428.12915039, 0, 418.72729492 ],
...: [ 0, 427.6109314, 238.20678711 ],
...: [ 0, 0, 1 ]])
In [15]: # camera distortion parameters (k1, k2, t1, t2, k3)
In [16]: distCoeff = np.array([-0.05380916, 0.0613398, -0.00064336, 0.00040269, -0.01984365])
In [17]: warpedImage, warpedDepth, warpedMask = cv.rgbd.warpFrame(image, depth, mask, Rt, cameraMatrix, distCoeff)
In [18]: cv.imwrite("warpedImage.png", warpedImage)
Out[18]: True
In [19]: cv.imwrite("warpedDepth.png", warpedDepth)
Out[19]: True
Frame Visualization
The camera is located at camera_color_optical_frame and looking at the object at an angle
The top frame named my_frame is situated on the top of the object
The object is kept at workspace frame
Input Images
Color Image
Depth Image
Output Images
Warped Image
Warped Depth
Expected Output
The output image should be similar to the picture taken from the camera at the top position. A sample image is shown below. We know that we can not get precisely this image; nevertheless, the image below is just for reference purposes.
Notice carefully that this image does not contain the red color attached to the object's walls.
Assuming you are fine with blank space due to parts not seen in the image, using another package, and some extra processing time, you can use Open3D to transform(basically, rotate) the RGBD Image by the required amount.
First create a open3d RGBD-image
import open3d as o3d
color_raw = o3d.io.read_image("image.png")
depth_raw = o3d.io.read_image("depth.png")
rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(
color_raw, depth_raw)
Then, convert to PCD, (I tried transforming an RGBD image but didn't work.. so conversion)
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(
rgbd_image,
o3d.camera.PinholeCameraIntrinsic(
o3d.camera.PinholeCameraIntrinsicParameters.PrimeSenseDefault))
# Flip it, otherwise the pointcloud will be upside down
pcd.transform([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])
o3d.visualization.draw_geometries([pcd], zoom=0.5)
Replace the default camera parameters by your own.
Then, you can use a transformation according to the frames
Next, if you have to convert the pcd back to an RGBD image, follow this example.
Secondly, here is a similar unanswered question where the user ends up using perspective transform for 2-D images
PS. new here, suggestions welcome

Unable to understand Opencv inbuilt calibrateCamera function

I am calibrating camera using opencv in built function calibrateCamera.
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
print("translation vector size",len(tvecs))
print("rotation vector size",len(rvecs))
print("translation \n",tvecs)
print("rotation \n",rvecs)
Output:
translation vector size 8
rotation vector size 8
translation
[array([[-2.89545711],
[ 0.53309405],
[16.90937607]]), array([[ 2.5887548 ],
[ 4.28267707],
[13.76961517]]), array([[-3.3813951 ],
[ 0.46023276],
[11.62316805]]), array([[-3.94407341],
[ 2.24712782],
[12.75758635]]), array([[-2.46697627],
[-3.45827811],
[12.90925656]]), array([[ 2.26913044],
[-3.25178618],
[15.65704473]]), array([[-3.65842398],
[-4.35145288],
[17.28001749]]), array([[-1.53432042],
[-4.34836431],
[14.06280739]])]
rotation
[array([[-0.08450996],
[ 0.35247622],
[-1.54211812]]), array([[-0.23013064],
[ 1.02133593],
[-2.79358726]]), array([[-0.34782976],
[-0.06411541],
[-1.20030736]]), array([[-0.27641699],
[ 0.10465832],
[-1.56231228]]), array([[-0.47298366],
[ 0.09331131],
[-0.22505762]]), array([[0.068391 ],
[0.44710268],
[0.10818745]]), array([[-0.09848595],
[ 0.32272789],
[ 0.31561383]]), array([[-0.35190574],
[ 0.24381052],
[ 0.2106984 ]])]
The obtained translation and rotation vectors are consist of eight 3*1 array objects. I expect translation and rotation vector should be of size 3*3 and 3*1 respectively. Please let me know how these values are relevant with translation and rotation matrix. Also suggest me how can I derive translation and rotation matrices from these obtained vectors.
Thanks !
The eight sets of arrays are the eight images you feed in.
tvecs and rvecs you get from calibrateCamera() are vectors. If you want the matrix form, you have to use Rodrigues().
3x1 translation vector is the one you want.
3x3 rotation matrix can be obtained by cv2.Rodrigues().
for rvec in rvecs
R_matrix, _ = cv2.Rodrigues(rvecs)
Also, if you want to concatenate the [R t] matrix, try this:
Rt_matirx = np.concatenate((R_matrix, tvec), axis=1)
See opencv document for more information.

Field of view of a GoPro camera

I have calibrated my GoPro Hero 4 Black using Camera calibration toolbox for Matlab and calculated its fields of view and focal length using OpenCV's calibrationMatrixValues(). These, however, differ from GoPro's specifications. Istead of 118.2/69.5 FOVs I get 95.4/63.4 and focal length 2.8mm instead of 17.2mm. Obviously something is wrong.
I suppose the calibration itself is correct since image undistortion seems to be working well.
Can anyone please give me a hint where I made a mistake? I am posting my code below.
Thanks.
Code
cameraMatrix = new Mat(3, 3, 6);
for (int i = 0; i < cameraMatrix.height(); i ++)
for (int j = 0; j < cameraMatrix.width(); j ++) {
cameraMatrix.put(i, j, 0);
}
cameraMatrix.put(0, 0, 582.18394);
cameraMatrix.put(0, 2, 663.50655);
cameraMatrix.put(1, 1, 582.52915);
cameraMatrix.put(1, 2, 378.74541);
cameraMatrix.put(2, 2, 1.);
org.opencv.core.Size size = new org.opencv.core.Size(1280, 720);
//output parameters
double [] fovx = new double[1];
double [] fovy = new double[1];
double [] focLen = new double[1];
double [] aspectRatio = new double[1];
Point ppov = new Point(0, 0);
org.opencv.calib3d.Calib3d.calibrationMatrixValues(cameraMatrix, size,
6.17, 4.55, fovx, fovy, focLen, ppov, aspectRatio);
System.out.println("FoVx: " + fovx[0]);
System.out.println("FoVy: " + fovy[0]);
System.out.println("Focal length: " + focLen[0]);
System.out.println("Principal point of view; x: " + ppov.x + ", y: " + ppov.y);
System.out.println("Aspect ratio: " + aspectRatio[0]);
Results
FoVx: 95.41677635378488
FoVy: 63.43170132212425
Focal length: 2.8063085232812504
Principal point of view; x: 3.198308916796875, y: 2.3934605770833333
Aspect ratio: 1.0005929569269807
GoPro specifications
https://gopro.com/help/articles/Question_Answer/HERO4-Field-of-View-FOV-Information
Edit
Matlab calibration results
Focal Length: fc = [ 582.18394 582.52915 ] ± [ 0.77471 0.78080 ]
Principal point: cc = [ 663.50655 378.74541 ] ± [ 1.40781 1.13965 ]
Skew: alpha_c = [ -0.00028 ] ± [ 0.00056 ] => angle of pixel axes = 90.01599 ± 0.03208 degrees
Distortion: kc = [ -0.25722 0.09022 -0.00060 0.00009 -0.01662 ] ± [ 0.00228 0.00276 0.00020 0.00018 0.00098 ]
Pixel error: err = [ 0.30001 0.28188 ]
One of the images used for calibration
And the undistorted image
You have entered 6.17mm and 4.55mm for the sensor size in OpenCV, which corresponds to an aspect ratio 1.36 whereas as your resolution (1270x720) is 1.76 (approximately 16x9 format).
Did you crop your image before MATLAB calibration?
The pixel size seems to be 1.55µm from this Gopro page (this is by the way astonishingly small!). If pixels are squared, and they should be on this type of commercial cameras, that means your inputs are not coherent. Computed sensor size should be :
[Sensor width, Sensor height] = [1280, 720]*1.55*10^-3 = [1.97, 1.12]
mm
Even if considering the maximal video resolution which is 3840 x 2160, we obtain [5.95, 3.35]mm, still different from your input.
Please see this explanation about equivalent focal length to understand why the actual focal length of the camera is not 17.2 but 17.2*5.95/36 ~ 2.8mm. In that case, compute FOV using the formulas here for instance. You will indeed find values of 93.5°/61.7° (close to your outputs but still not what is written in the specifications because there probably some optical distortion due to the wide angles).
What I do not understand though, is how the focal distance returned can be right whereas sensor size entered is wrong. Could you give more info and/or send an image?
Edits after question updates
On that cameras, with a working resolution of 1280x720, the image is downsampled but not cropped so what I said above about sensor dimensions does not apply. The sensor size to consider is indeed the one used (6.17x4.55) as explained in your first comment.
The FOV is constrained by the calibration matrix inputs (fx, fy, cx, cy) given in pixels and the resolution. You can check it by typing:
2*DEGRES(ATAN(1280/(2*582.18394))) (=95.416776...°)
This FOV value is smaller than what is expected, but by the look of the undistorted image, your MATLAB distortion model is right and the calibration is correct. The barrel distortion due to the wide angle seems well corrected by the the rewarp you applied.
However, MATLAB toolbox uses a pinhole model, which is linear and cannot account for intrinsic parameters such as lens distortion. I assume this from the page :
https://fr.mathworks.com/help/vision/ug/camera-calibration.html
Hence, my best guess is that unless you find a model which fits more accurately the Gopro camera (maybe a wide-angle lens model), MATLAB calibration will return an intrinsic camera matrix corresponding to the "linear" undistorted image and the FOV will indeed be smaller (in the case of barrel distortion). You will have to apply distortion coefficients associated to the calibration to retrieve the actual FOV value.
We can see in the corrected image that side parts of the FOV get rejected out of bounds. If you had warped the image entirely, you would find that some undistorted pixels coordinates exceed [-1280/2;+1280/2] (horizontally, and idem vertically). Then, replacing opencv.core.Size(1280, 720) by the most extreme ranges obtained, you would hopefully retrieve Gopro website values.
In conclusion, I think you can rely on the focal distance value that you obtained if you make measurements in the center of your image, otherwise there is too much distortion and it doesn't apply.

CameraCalibration and Image processing

I would like you to clarify me on these questions plz
1.If I calibrate my camera to a particular resolution say 640x360, can I use it for another resolution like 1024x768?
2.Also I want to know as to how many centimeters does 1 pixel contain in my image. It varies from system to system., How do I find that?. Also, it is not compulsorily square in shape. So I have to find the length and width of it. How do I do that?
I am using a logitech c170 which is a low speed cam. Is it okay to get an error around 8mm when I am trying to measure the distances in the image and compare them with real-time distances?
EDIT1:
Since the number of pixels in 1 mm is sensor_width/image_width , which is the inverse of density, I can calculate a_x/f and find the inverse right?
#marol
Intrinsic parameters of left camera:
Focal Length: fc_left = [ 1442.67707 1457.17435 ] ± [ 18.12442 19.46439 ]
Principal point: cc_left = [ 497.66112 291.77311 ] ± [ 42.37874 31.97065 ]
Skew: alpha_c_left = [ 0.00000 ] ± [ 0.00000 ] => angle of pixel axes = 90.00000 ± 0.00000 degrees
Distortion: kc_left = [ 0.02924 -0.65151 -0.01104 -0.01342 0.00000 ] ± [ 0.16553 1.57119 0.00913 0.01306 0.00000 ]
Intrinsic parameters of right camera:
Focal Length: fc_right = [ 1443.32678 1458.82558 ] ± [ 25.55850 26.08659 ]
Principal point: cc_right = [ 567.11672 258.09152 ] ± [ 20.46962 17.87495 ]
Skew: alpha_c_right = [ 0.00000 ] ± [ 0.00000 ] => angle of pixel axes = 90.00000 ± 0.00000 degrees
Distortion: kc_right = [ -0.58576 21.53289 -0.02278 0.00845 0.00000 ] ± [ 0.28148 9.37092 0.00787 0.00847 0.00000 ]
Extrinsic parameters (position of right camera wrt left camera):
Rotation vector: om = [ -0.04239 0.02401 -0.00677 ]
Translation vector: T = [ 71.66430 -0.79025 -8.76546 ]
If you mean: I have calibrated my camera using set of images with resolution X so I got calibration matrix K, can I use this matrix with images of different resolution Y? The direct answer is no, you cannot, since calibration matrix K has a form:
K = [a_x, 0, c_x;
0, a_y, c_y;
0, 0, 1;]
Where a_x = focal_length * density of pixels on mm in x direction, a_y = focal_length * density of pixels on mm in y direction (usually those densities are equal) and c_x = translation of image plane to principal point in x direction (similar c_y). When you ouput your calibration matrix K you will see something like:
K = [a_x, 0, 320;
0, a_y, 180;
0, 0, 1]
And yes, you can see that c_x = 320 = 640 / 2 and c_y = 180 = 360/2. So your calibration matrix is correlated with the image resolution, so you cannot use it directly with any other resolution without changing matrix K.
2.You have to divide sensor size by image size, ie
k_x = 1 / c_x = sensor_size_width / image_size_width.
k_y = 1 / c_y
Image sensor is that tiny plane made from photosensitive material which absorb light in your camera device. Usually you can find such information in camera manual, search for sensor size.
EDIT: And if you can't find sensor size in the camera manual, what is a normal behavior in case of webcameras, you can try to do the following: calibrate your camera given matrix K. Value a_x and a_y contains such information. Since we said a_x = f * density, so if you know focal length (and you know - it is 2.3mm - see here) so you can find out density = a_x / f. And we know that density is equal to image_width / sensor_width, so finally we have sensor_width = image_width / density = image_width * f / a_x. Similar thinking for sensor_height.
EDIT2: For example if you get:
Focal Length: fc_left = [ 1442.67707 1457.17435 ] ± [ 18.12442 19.46439 ]
So we have a_x = 1442.67707. From our conclusions and if we assume image size to 640 x 320, we have sensor width = 640 * 2.3 / 1442.67707 = 1,02 mm.

Mul function in HLSL: Which one should be the first parameter? the Vector or the Matrix?

I am learning HLSL shading, and in my vertex shader, I have code like this:
VS_OUTPUT vs_main(
float4 inPos: POSITION,
float2 Txr1: TEXCOORD0 )
{
VS_OUTPUT Output;
Output.Position = mul( inPos, matViewProjection);
Output.Tex1 = Txr1;
return( Output );
}
It works fine. But when I was typing codes from the book, the code was like this:
VS_OUTPUT vs_main(
float4 inPos: POSITION,
float2 Txr1: TEXCOORD0 )
{
VS_OUTPUT Output;
Output.Position = mul( matViewProjection, inPos );
Output.Tex1 = Txr1;
return( Output );
}
At first I thought maybe the order does not matter. However, when I exchanged the parameters in the mul function in my code, it does not work. I don't know why.
BTW, I am using RenderMonkey.
This issue is known as pre- vs. post-multiplication.
By convention, matrices produced by D3DX are stored in row-major order. To produce proper results you have to pre-multiply. That means that for matViewProjection to transform the vector inPos into clip-space inPos should appear on the l-hand side (first parameter).
Order absolutely matters, matrix multiplication is not commutative. However, pre-multiplying a matrix is the same as post-multiplying the transpose of the same matrix. To put this another way, if you were using the same matrix but stored in column-major order (transposed) then you would want to swap the operands.
Thus (vector on r-hand side -- also known as post-multiplication):
[ 0, 0, 0, m41 ] [ x ]
[ 0, 0, 0, m42 ] * [ y ]
[ 0, 0, 0, m43 ] [ z ]
[ 0, 0, 0, m44 ] [ w ]
When the vector appears on the r-hand side it is interpreted as a column-vector.
Is equivalent to (vector on l-hand side -- also known as pre-multiplication):
[ x, y, z, w ] * [ 0, 0, 0, 0 ]
[ 0, 0, 0, 0 ]
[ 0, 0, 0, 0 ]
[ m41, m42, m43, m44 ]
When the vector appears on the l-hand side, it is interpreted as a row-vector.
There is no universally correct side, it depends on how the matrix is represented.
Not sure if this is caused by matrix order (row major or column major), take a look at Type modifier and mul function
Updated:
I have test this in my project, use the keyword row_major make the second case work.
row_major matrix matViewProjection
mul(matViewProjection, inPos);

Resources