I am currently experimenting with ORB SLAM 2 and a stereo camera like this. I am using 2.8mm and optionally 3.6mm lenses with a resolution of 640x480 pixels for the left and right camera/image.
ORB SLAM 2 lets me define several distortion/rectifying parameters withing the settings file (*.yaml), such as:
fx, fy, cx, cy
k1, k2, p1, p2
I conducted the OpenCV camera calibration using a chessboard like described here (9x7 inner corners and 70mm square length). Later on I used this automated calibration program from MRPT which gives me the same results with less stumbling blocks.
However, ORB SLAM 2 lets me define these additional parameters for pre-rectifying the images (if I understand this correctly):
D: 1x5 Matrix -> Distortion Coefficients aquired from calibration (fx,fy,cx,cy) ?
K: 3x3 Matrix -> Intrinsic Matrix aquired from calibration (k1,k2,p1,p2,k3) ?
R: 3x3 Matrix -> Rectification Transformation ?
P: 3x4 Matrix -> New Projection Matrix ?
My questions are the following (see below for an example settings.yaml file):
A.) Is my assumption correct, that D are the distortion coefficients and K is the intrinsic matrix acquired from the checkboard calibration procedure ?
B.) Is defining fx, fy, cx, cy in settings.yaml sufficient for pre-rectifying the images and successful operation of ORB SLAM 2 ?
C.) Do I need R and P matrices for successful operation of ORB SLAM 2 ?
D.) How can I acquired the R and P matrices? The OpenCV camera calibration procedure with the checkboard does not provide me these matrices, correct ?
Here's an example of the above mentioned settings.yaml file of ORB SLAM 2:
%YAML:1.0
#--------------------------------------------------------------------------------------------
# Camera Parameters. Adjust them!
#--------------------------------------------------------------------------------------------
# Camera calibration and distortion parameters (OpenCV)
Camera.fx: 646.53807309613160
Camera.fy: 647.36136487241527
Camera.cx: 320.94123353073792
Camera.cy: 219.07092188981900
Camera.k1: -0.43338537102343577
Camera.k2: 0.46801812273859494
Camera.p1: 0.0039978632628183738
Camera.p2: 0.00023265675941025371
Camera.width: 640
Camera.height: 480
# Camera frames per second
Camera.fps: 20.0
# stereo baseline times fx
Camera.bf: 38.76
# Color order of the images (0: BGR, 1: RGB. It is ignored if images are grayscale)
Camera.RGB: 1
# Close/Far threshold. Baseline times.
ThDepth: 50
#--------------------------------------------------------------------------------------------
# Stereo Rectification. Only if you need to pre-rectify the images.
# Camera.fx, .fy, etc must be the same as in LEFT.P
#--------------------------------------------------------------------------------------------
LEFT.width: 640
LEFT.height: 480
LEFT.D: !!opencv-matrix
rows: 1
cols: 5
dt: d
data:[-0.28340811, 0.07395907, 0.00019359, 1.76187114e-05, 0.0]
LEFT.K: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [458.654, 0.0, 367.215, 0.0, 457.296, 248.375, 0.0, 0.0, 1.0]
LEFT.R: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [0.999966347530033, -0.001422739138722922, 0.008079580483432283, 0.001365741834644127, 0.9999741760894847, 0.007055629199258132, -0.008089410156878961, -0.007044357138835809, 0.9999424675829176]
LEFT.P: !!opencv-matrix
rows: 3
cols: 4
dt: d
data: [435.2046959714599, 0, 367.4517211914062, 0, 0, 435.2046959714599, 252.2008514404297, 0, 0, 0, 1, 0]
RIGHT.width: 640
RIGHT.height: 480
RIGHT.D: !!opencv-matrix
rows: 1
cols: 5
dt: d
data:[-0.28368365, 0.07451284, -0.00010473, -3.555907e-05, 0.0]
RIGHT.K: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [457.587, 0.0, 379.999, 0.0, 456.134, 255.238, 0.0, 0.0, 1]
RIGHT.R: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [0.9999633526194376, -0.003625811871560086, 0.007755443660172947, 0.003680398547259526, 0.9999684752771629, -0.007035845251224894, -0.007729688520722713, 0.007064130529506649, 0.999945173484644]
RIGHT.P: !!opencv-matrix
rows: 3
cols: 4
dt: d
data: [435.2046959714599, 0, 367.4517211914062, -47.90639384423901, 0, 435.2046959714599, 252.2008514404297, 0, 0, 0, 1, 0]
#--------------------------------------------------------------------------------------------
# ORB Parameters
#--------------------------------------------------------------------------------------------
# ORB Extractor: Number of features per image
ORBextractor.nFeatures: 800
# ORB Extractor: Scale factor between levels in the scale pyramid
ORBextractor.scaleFactor: 1.2
# ORB Extractor: Number of levels in the scale pyramid
ORBextractor.nLevels: 8
# ORB Extractor: Fast threshold
# Image is divided in a grid. At each cell FAST are extracted imposing a minimum response.
# Firstly we impose iniThFAST. If no corners are detected we impose a lower value minThFAST
# You can lower these values if your images have low contrast
ORBextractor.iniThFAST: 12
ORBextractor.minThFAST: 3
#--------------------------------------------------------------------------------------------
# Viewer Parameters
#--------------------------------------------------------------------------------------------
Viewer.KeyFrameSize: 0.05
Viewer.KeyFrameLineWidth: 1
Viewer.GraphLineWidth: 0.9
Viewer.PointSize:2
Viewer.CameraSize: 0.08
Viewer.CameraLineWidth: 3
Viewer.ViewpointX: 0
Viewer.ViewpointY: -0.7
Viewer.ViewpointZ: -1.8
Viewer.ViewpointF: 500
In my opinion, there are several calibration toolboxes used for calibrating monocular, stereo or multi-cameras.
The first one is ros_camera_calibration. when running ORBSLAM, I prefer to use this package to obtain the intrinsic parameters of the single moving camera. The intrinsic parameters and distortion coefficients, and projection matrices would be acquired after moving the calibration board.
the second one, what I recently used is Kalibr. It is not only designed to calibrate multi-cameras but also can calibrate jointly the camera and inertial measurement units(IMU).
Besides, You also can use MATLABto get the camera's intrinsic parameters.
As for your questions, here are my imperfect answers.
Q.A: K(fx, fy, cx,cy) stands for the intrinsic parameters of the camera and distortion Coefficients are k1,k2,p1.p2 separately.
Q.B: as far as I'm concerned, obtaining intrinsic parameters, including fx, fy, cx, cy, are sufficient to run ORBSLAM2 with your own cameras.
Q.C&D, if you choose to use this ROS package, in the end, you will receive the projection matrix and rectification transformation.
Related
I am fairly new to computer vision and OpenCV so my details might no be throughout, sorry.
I am currently trying to calibrate my stereo camera to get both intrinsic and extrinsic parameters. My end goal is to undistort and rectify the images pair and use correspondence to find certain points in each images.
The stereo camera I am using: http://www.webcamerausb.com/elp-synchronized-dual-lens-stereo-usb-camera-13mp-hd-960p-webcam-3d-vr-web-camera-module-with-13-cmos-ov9715-image-sensor-camera-module-mini-industrial-usb20-web-cam-plugplay-for-androidlinuxwindows-p-285.html
Images taken: 20 pairs of stereo images
This is the snippet of code I am using for camera calibration:
for fname in images:
img = cv2.imread(fname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, corners = cv2.findChessboardCorners(gray, (width, height), None)
if ret:
print(fname)
objpoints.append(objp)
corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
imgpoints.append(corners2)
img = cv2.drawChessboardCorners(img, (width, height), corners2, ret)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
I calibrated the left lens and right lens separately. However my issue is that the focal length in the camera matrix for each of the lens are different.
LEFT LENS:
K: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [ 2.7630975322098088e+03, 0., 6.3154856415725897e+02, 0.,
2.8097306962970929e+03, 4.9132766901283384e+02, 0., 0., 1. ]
D: !!opencv-matrix
rows: 1
cols: 5
dt: d
data: [ -4.6255313067932485e-01, -5.5060376742654917e+01,
-9.9065660338455458e-02, 4.4853567872048555e-02,
8.3136561769726973e+02 ]
RIGHT LENS:
K: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [ 1.1603188984395067e+03, 0., 6.4378728024327643e+02, 0.,
1.1556999845227924e+03, 5.0433004252302896e+02, 0., 0., 1. ]
D: !!opencv-matrix
rows: 1
cols: 5
dt: d
data: [ -6.0796521210889765e-01, 6.0622199106747054e-01,
-1.4097123552960564e-02, 1.4581825861357409e-02,
-6.8179582332173561e-01 ]
What am I doing wrong and how should I correct the issues?
Thanks!
What I have tried:
Calibrated camera individually
Calibrate both camera at once
It seems you are calibrating monocular cameras only. Your current code is not computing the relative pose of one camera with respect to the other. To achieve this part, you can have a look at the function 'cv.stereoCalibrate'
https://docs.opencv.org/4.6.0/d9/d0c/group__calib3d.html#ga91018d80e2a93ade37539f01e6f07de5
Solving the problem of monocular calibration first will make sure you have good input data to be used for the next stage of stereo calibration.
I do not see anything fundamentally wrong with your code sample. With the entire code, and some sample input images, it would be easier to find the origin of the problem.
CalibPro provides an easy to use interface to calibrate cameras without writing a single line of code. I would definitely recommend using it to see if you obtain the same results.
[Disclaimer] I am the founder of Calibpro. I am happy to help you to use the platform and I'd love to have your feedback.
I am trying to project a 2D image from a 3D point cloud. Herein, the 3D point cloud is constructed using RGB and Depth images whose FOV is 120 degree. Once the 3D point cloud is constructed, I wanted to reproject a 2D image from it using a FOV 80 degree. The resolution of the images used for building 3D point cloud and the reprojected 2D image are same [width: 1820, height: 940].
I calculated the intrinsic matrix by following the below steps
The camera model used here is PinHole.
Step 1: Calculating focal lengths fx and fy
hfov = fov / 360. * 2. * np.pi
fx = img_w / (2. * np.tan(hfov / 2.))
vfov = 2. * np.arctan(np.tan(hfov / 2) * img_h / img_w)
fy = img_h / (2. * np.tan(vfov / 2.))
* fx and fy are in pixel length
Step 2: Calculating image centers u0, v0
u0, v0 = img_w / 2, img_h / 2
Step 3: Form the matrix
[[fx, 0, u0, 0],
[0, fy, v0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
I used this intrinsic matrix and the extrinsic matrix (calculated from x,y,z, roll, ptich and yaw) to reproject the 2D image with 80 degree FOV using Open3D api from 3D point cloud which was constructed using 120 degree FOV.
When I follow the above steps, I ended up with a 2D image which is zoomed in a lot than expected. However, if I reproject the 2D image with 120 degree FOV then the resulting image is almost close to the expectation (some extra regions are projected but camera to plane distance is perfect).
You can find the sample output below
Reprojected image with FOV 80 degree from the point cloud (ego vehicle will be missing because it is not part of the RGB-D images)
Reference image (taken with same FOV (80 degree), resolution, and extrinsic parameters)
I am very certain that there is no issue with the extrinsic matrix calculation and the reprojection part because it is well tested and verified. The zoom in effect comes from the intrinsic matrix.
I am assuming that I am missing or doing something wrong while calculating fx and fy. Perhaps, there may be a need of adjustment factor while dealing with different FOV's or some inconsistencies in the units.
Any help here would be appreciated.
I am facing the challenge to recover the Z coordinate from an inverse disparity map and the camera matrix K. The Z coordinate does not have to be metric but it has to be scale aware.
K = [[721, 0, 596],
[0, 721, 149],
[0, 0, 1]
I know that z=f*(T/d) where T is the distance between two co-planer cameras f is the focal length and d is the disparity. However, I am not told any extrinsic other than the camera is 1.5m up from the floor. The disparity is already an inverse disparity so I guess z=f*d? Also, what do I do with the other parameters in my K matrix (596, 149)? Is there any implementation in a library like OpenCV to recover the scale-aware 3D coordinates?
I've already computed the Fundamental Matrix of a stereo pair through corresponding points, found using SURF. According to Hartley and Zisserman, the Essential Matrix is computed doing:
E = K.t() * F * K
How I get K? Is there another way to compute E?
I don't know where you got that formulae, but the correct one is
E = K'^T . F . K (see Hartley & Zisserman, ยง9.6, page 257 of second edition)
K is the intrinsic camera parameters, holding scale factors and positions of the center of the image, expressed in pixel units.
| \alpha_u 0 u_0 |
K = | 0 \alpha_u v_0 |
| 0 0 1 |
(sorry, Latex not supported on SO)
Edit : To get those values, you can either:
calibrate the camera
compute an approximate value if you have the manufacturer data. If the lens is correctly centered on the sensor, then u_0 and v_0 are the half of, respectively, width and height of image resolution. And alpha = k.f with f: focal length (m.), and k the pixel scale factor: if you have a pixel of, say, 6 um, then k=1/6um.
Example, if the lens is 8mm and pixel size 8um, then alpha=1000
Computing E
Sure, there are several of ways to compute E, for example, if you have strong-calibrated the rig of cameras, then you can extract R and t (rotation matrix and translation vector) between the two cameras, and E is defined as the product of the skew-symmetric matrix t and the matrix R.
But if you have the book, all of this is inside.
Edit Just noticed, there is even a Wikipedia page on this topic!
This question belongs to the topic - 'Structure from Motion'.
Suppose, there are 3 images. There are point correspondences between image 1-2 and image 2-3, but there's no common point between image 1 and 3. I got the RT (rotation and translation matrix) of image 2, RT12, with respect to image 1(considering image 1 RT as [I|0], that means, rotation is identity, translation is zero). Lets split RT12 into R12 and T12.
Similarly, I got RT23 considering image 2 RT as [I|0]. So, now I have R23 and T23, who are related to image 2, but not image 1. Now I want to find R13 and T13.
For a synthetic dataset, the equation R13=R23*R12 is giving correct R(verified, as I actually have the R13 precalculated). Similary T13 should be T2+T1. But the translation computed this way is bad. Since I have actual results with me, I could verify that Rotation was estimated nicely, but not translation. Any ideas?
This is a simple matrix block-multiplication problem, but you have to remember that you are actually considering 4x4 matrices (associated to rigid transformations in the 3D homogeneous world) and not 3x4 matrices.
Your 3x4 RT matrices actually correspond to the first three rows of 4x4 matrices A, whose last rows are [0, 0, 0, 1]:
RT23 -> A23=[R23, T23; 0, 0, 0, 1]
RT12 -> A12=[R12, T12; 0, 0, 0, 1]
Then, if you do the matrix block-multiplication on the 4x4 matrices (A13 = A23 * A12), you'll quickly find out that:
R13 = R23 * R12
T13 = R23 * T12 + T23