Different intrinsic parameters for stereo camera

Different intrinsic parameters for stereo camera - opencv

I am fairly new to computer vision and OpenCV so my details might no be throughout, sorry.
I am currently trying to calibrate my stereo camera to get both intrinsic and extrinsic parameters. My end goal is to undistort and rectify the images pair and use correspondence to find certain points in each images.
The stereo camera I am using: http://www.webcamerausb.com/elp-synchronized-dual-lens-stereo-usb-camera-13mp-hd-960p-webcam-3d-vr-web-camera-module-with-13-cmos-ov9715-image-sensor-camera-module-mini-industrial-usb20-web-cam-plugplay-for-androidlinuxwindows-p-285.html
Images taken: 20 pairs of stereo images
This is the snippet of code I am using for camera calibration:
for fname in images:
img = cv2.imread(fname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, corners = cv2.findChessboardCorners(gray, (width, height), None)
if ret:
print(fname)
objpoints.append(objp)
corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
imgpoints.append(corners2)
img = cv2.drawChessboardCorners(img, (width, height), corners2, ret)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
I calibrated the left lens and right lens separately. However my issue is that the focal length in the camera matrix for each of the lens are different.
LEFT LENS:
K: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [ 2.7630975322098088e+03, 0., 6.3154856415725897e+02, 0.,
2.8097306962970929e+03, 4.9132766901283384e+02, 0., 0., 1. ]
D: !!opencv-matrix
rows: 1
cols: 5
dt: d
data: [ -4.6255313067932485e-01, -5.5060376742654917e+01,
-9.9065660338455458e-02, 4.4853567872048555e-02,
8.3136561769726973e+02 ]
RIGHT LENS:
K: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [ 1.1603188984395067e+03, 0., 6.4378728024327643e+02, 0.,
1.1556999845227924e+03, 5.0433004252302896e+02, 0., 0., 1. ]
D: !!opencv-matrix
rows: 1
cols: 5
dt: d
data: [ -6.0796521210889765e-01, 6.0622199106747054e-01,
-1.4097123552960564e-02, 1.4581825861357409e-02,
-6.8179582332173561e-01 ]
What am I doing wrong and how should I correct the issues?
Thanks!
What I have tried:
Calibrated camera individually
Calibrate both camera at once

It seems you are calibrating monocular cameras only. Your current code is not computing the relative pose of one camera with respect to the other. To achieve this part, you can have a look at the function 'cv.stereoCalibrate'
https://docs.opencv.org/4.6.0/d9/d0c/group__calib3d.html#ga91018d80e2a93ade37539f01e6f07de5
Solving the problem of monocular calibration first will make sure you have good input data to be used for the next stage of stereo calibration.
I do not see anything fundamentally wrong with your code sample. With the entire code, and some sample input images, it would be easier to find the origin of the problem.
CalibPro provides an easy to use interface to calibrate cameras without writing a single line of code. I would definitely recommend using it to see if you obtain the same results.
[Disclaimer] I am the founder of Calibpro. I am happy to help you to use the platform and I'd love to have your feedback.

Related

Using OpenCV to project known 3D points onto photo taken in Unity

I've constructed a basic scene in Unity, taken some photos using Unity's Physical Camera object, and am now trying to use OpenCV to project known 3D points in the scene onto the image. For reference, the scene is just a few textured spheres and an intersecting plane (here is one of the images taken).
My method is:
Determine the camera intrinsic matrix from the Physical Camera parameters. As I understand, Unity Physical Cameras experience no distortion. I calculate the parameters in Unity as:
float pixelAspectRatio = (float)cam.pixelWidth / (float)cam.pixelHeight;
float f_x = cam.focalLength * ((float)cam.pixelWidth / cam.sensorSize.x);
float f_y = cam.focalLength * pixelAspectRatio * ((float)cam.pixelHeight / cam.sensorSize.y);
float x_0 = (float)cam.pixelWidth / 2;
float y_0 = (float)cam.pixelHeight / 2;
Apply change of basis matrices to convert the 3D points and Unity camera position vector/rotation matrix from Unity coordinates (left-handed) to the coordinates OpenCV uses (right-handed). As I understand it, my choice of change of basis matrix is somewhat arbitrary? It only needs to swap orientation, and I can apply another (orientation-preserving) matrix to the rotation matrix afterwards to ensure the Camera orientation is in OpenCV coordinates (following the convention here). In case I'm unclear, here's the code:
COB = np.array( # Change of basis matrix between Unity and OpenCV
[[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.]]
)
def tvec_unity_to_cv(tvec:np.ndarray): # tvec is a 3x1 position vector
return COB # tvec
ROT_ADJUSTMENT = np.array( # Secondary COB to ensure camera axes follow OpenCV convention after rotation
[[1., 0., 0.],
[0., 0., -1.],
[0., 1., 0.]]
)
def camera_rot_unity_to_cv(rot:np.ndarray): # rot is a 3x3 rotation matrix
trans = ROT_ADJUSTMENT # COB
return trans # rot # trans
Both COB and trans are idempotent, so I can use them in place of their inverses (I think?!).
Finally, I use cv.projectPoints(3d_points, rvec, tvec, K, distCoeffs=None) to project these points onto my image.
However, it all falls apart here: my projected points don't appear in the correct spots. Before you ask, I'm certain that I'm using the correct 3D positions (they are very easy to identify in Unity) - I'm currently trying to plot the plane corners and the top of the spheres, like this, but it comes out like this (excuse the colour shifting).
The task itself doesn't seem complex, so I'm wondering how I'm getting it wrong. Does anybody have any tips? Thanks!

Camera calibration encountered high error on large images by long focal length lens

I'm new here and if I broke any rule please help me to improve.
I'm doing some work on visual localization with a working radius about 300m. So I use a big camera with 4912*3684 resolution. But my camera calibration with a chessboard end up with a high reprojection error over 3.6 pix.
The camera_matrix is
[ 3.0126352098515147e+05, 0., 2456.,
0., 4.3598609578377334e+05, 1842.,
0., 0., 1. ]
I realized that fx is far from fy. And the nominal pixel size is 1.25um, the focal length is 755mm.
And I refer to some suggestion from this question FindChessboardCorners cannot detect chessboard on very large images by long focal length lens
The likely correct way to proceed is to start at a lower resolution (i.e. downsizing), then scale up the positions of the corners thus found, and use them as the initial estimates for a run of cvFindCornersSubpix at full resolution.
So I resize the input image before cv::findChessboardCorners() as the code below:
cv::Size msize(1228, 921); //for resolution 4912*3684
int downsize = 4; //downsize scale factor
cv::Mat small; // temp file to downsize the image
cv::resize(imageInput, small, msize);
bool ok = findChessboardCorners(small, board_size, image_points, CV_CALIB_CB_ADAPTIVE_THRESH | CV_CALIB_CB_NORMALIZE_IMAGE);
if(ok){
//rectify the corner
for (size_t j = 0; j < image_points.size(); j++)
{
image_points[j].x = image_points[j].x * downsize;
image_points[j].y = image_points[j].y * downsize;
}
Mat view_gray;
cout << "imageInput.channels()=" << imageInput.channels() << endl;
cvtColor(imageInput, view_gray, CV_RGB2GRAY);
cv::cornerSubPix(view_gray, image_points, cv::Size(11, 11), cv::Size(-1, -1), cv::TermCriteria(CV_TERMCRIT_ITER + CV_TERMCRIT_EPS, 40, 0.01));
image_points_seq.push_back(image_points);
}
double err_first = calibrateCamera(object_points_seq, image_points_seq, image_size, cameraMatrix, distCoeffs, rvecsMat, tvecsMat, CV_CALIB_FIX_K3 | CALIB_FIX_PRINCIPAL_POINT);
And here are my input images:
images for calibration
Please tell me how to get an accurate calibration result!!!

For any calibration to be accurate, you should try considering the following things :
Ensure the focus is correct by verifying it with a simple focus chart.
Environment matters, the scene should be less reflective.
Calibration depends on the focus chart you use. So it is highly critical to have a focus chart to be flat. Any millimetre level bulges also would affect the calibration.
Consider covering the corners to get better distortion coefficients.
Use different pattern positions to cover the maximum of the field of view.
Apart from all these, get the calibration error for individual images and you can observe which image has got more error and which one is good. Unfocussed images and blurred images should be simply discarded for the calibration process. It is an easy process if you give your patient time. Have a good time calibrating.

How to get camera calibration matrices?

I am currently experimenting with ORB SLAM 2 and a stereo camera like this. I am using 2.8mm and optionally 3.6mm lenses with a resolution of 640x480 pixels for the left and right camera/image.
ORB SLAM 2 lets me define several distortion/rectifying parameters withing the settings file (*.yaml), such as:
fx, fy, cx, cy
k1, k2, p1, p2
I conducted the OpenCV camera calibration using a chessboard like described here (9x7 inner corners and 70mm square length). Later on I used this automated calibration program from MRPT which gives me the same results with less stumbling blocks.
However, ORB SLAM 2 lets me define these additional parameters for pre-rectifying the images (if I understand this correctly):
D: 1x5 Matrix -> Distortion Coefficients aquired from calibration (fx,fy,cx,cy) ?
K: 3x3 Matrix -> Intrinsic Matrix aquired from calibration (k1,k2,p1,p2,k3) ?
R: 3x3 Matrix -> Rectification Transformation ?
P: 3x4 Matrix -> New Projection Matrix ?
My questions are the following (see below for an example settings.yaml file):
A.) Is my assumption correct, that D are the distortion coefficients and K is the intrinsic matrix acquired from the checkboard calibration procedure ?
B.) Is defining fx, fy, cx, cy in settings.yaml sufficient for pre-rectifying the images and successful operation of ORB SLAM 2 ?
C.) Do I need R and P matrices for successful operation of ORB SLAM 2 ?
D.) How can I acquired the R and P matrices? The OpenCV camera calibration procedure with the checkboard does not provide me these matrices, correct ?
Here's an example of the above mentioned settings.yaml file of ORB SLAM 2:
%YAML:1.0
#--------------------------------------------------------------------------------------------
# Camera Parameters. Adjust them!
#--------------------------------------------------------------------------------------------
# Camera calibration and distortion parameters (OpenCV)
Camera.fx: 646.53807309613160
Camera.fy: 647.36136487241527
Camera.cx: 320.94123353073792
Camera.cy: 219.07092188981900
Camera.k1: -0.43338537102343577
Camera.k2: 0.46801812273859494
Camera.p1: 0.0039978632628183738
Camera.p2: 0.00023265675941025371
Camera.width: 640
Camera.height: 480
# Camera frames per second
Camera.fps: 20.0
# stereo baseline times fx
Camera.bf: 38.76
# Color order of the images (0: BGR, 1: RGB. It is ignored if images are grayscale)
Camera.RGB: 1
# Close/Far threshold. Baseline times.
ThDepth: 50
#--------------------------------------------------------------------------------------------
# Stereo Rectification. Only if you need to pre-rectify the images.
# Camera.fx, .fy, etc must be the same as in LEFT.P
#--------------------------------------------------------------------------------------------
LEFT.width: 640
LEFT.height: 480
LEFT.D: !!opencv-matrix
rows: 1
cols: 5
dt: d
data:[-0.28340811, 0.07395907, 0.00019359, 1.76187114e-05, 0.0]
LEFT.K: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [458.654, 0.0, 367.215, 0.0, 457.296, 248.375, 0.0, 0.0, 1.0]
LEFT.R: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [0.999966347530033, -0.001422739138722922, 0.008079580483432283, 0.001365741834644127, 0.9999741760894847, 0.007055629199258132, -0.008089410156878961, -0.007044357138835809, 0.9999424675829176]
LEFT.P: !!opencv-matrix
rows: 3
cols: 4
dt: d
data: [435.2046959714599, 0, 367.4517211914062, 0, 0, 435.2046959714599, 252.2008514404297, 0, 0, 0, 1, 0]
RIGHT.width: 640
RIGHT.height: 480
RIGHT.D: !!opencv-matrix
rows: 1
cols: 5
dt: d
data:[-0.28368365, 0.07451284, -0.00010473, -3.555907e-05, 0.0]
RIGHT.K: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [457.587, 0.0, 379.999, 0.0, 456.134, 255.238, 0.0, 0.0, 1]
RIGHT.R: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [0.9999633526194376, -0.003625811871560086, 0.007755443660172947, 0.003680398547259526, 0.9999684752771629, -0.007035845251224894, -0.007729688520722713, 0.007064130529506649, 0.999945173484644]
RIGHT.P: !!opencv-matrix
rows: 3
cols: 4
dt: d
data: [435.2046959714599, 0, 367.4517211914062, -47.90639384423901, 0, 435.2046959714599, 252.2008514404297, 0, 0, 0, 1, 0]
#--------------------------------------------------------------------------------------------
# ORB Parameters
#--------------------------------------------------------------------------------------------
# ORB Extractor: Number of features per image
ORBextractor.nFeatures: 800
# ORB Extractor: Scale factor between levels in the scale pyramid
ORBextractor.scaleFactor: 1.2
# ORB Extractor: Number of levels in the scale pyramid
ORBextractor.nLevels: 8
# ORB Extractor: Fast threshold
# Image is divided in a grid. At each cell FAST are extracted imposing a minimum response.
# Firstly we impose iniThFAST. If no corners are detected we impose a lower value minThFAST
# You can lower these values if your images have low contrast
ORBextractor.iniThFAST: 12
ORBextractor.minThFAST: 3
#--------------------------------------------------------------------------------------------
# Viewer Parameters
#--------------------------------------------------------------------------------------------
Viewer.KeyFrameSize: 0.05
Viewer.KeyFrameLineWidth: 1
Viewer.GraphLineWidth: 0.9
Viewer.PointSize:2
Viewer.CameraSize: 0.08
Viewer.CameraLineWidth: 3
Viewer.ViewpointX: 0
Viewer.ViewpointY: -0.7
Viewer.ViewpointZ: -1.8
Viewer.ViewpointF: 500

In my opinion, there are several calibration toolboxes used for calibrating monocular, stereo or multi-cameras.
The first one is ros_camera_calibration. when running ORBSLAM, I prefer to use this package to obtain the intrinsic parameters of the single moving camera. The intrinsic parameters and distortion coefficients, and projection matrices would be acquired after moving the calibration board.
the second one, what I recently used is Kalibr. It is not only designed to calibrate multi-cameras but also can calibrate jointly the camera and inertial measurement units(IMU).
Besides, You also can use MATLABto get the camera's intrinsic parameters.
As for your questions, here are my imperfect answers.
Q.A: K(fx, fy, cx,cy) stands for the intrinsic parameters of the camera and distortion Coefficients are k1,k2,p1.p2 separately.
Q.B: as far as I'm concerned, obtaining intrinsic parameters, including fx, fy, cx, cy, are sufficient to run ORBSLAM2 with your own cameras.
Q.C&D, if you choose to use this ROS package, in the end, you will receive the projection matrix and rectification transformation.

OpenCV Stereo Calibration (example code questions)

At the moment I am implementing the calibration method(s) for stereo vision. I am using the OpenCV library.
There is an example in the sample folder, but I have some questions about the implementation:
Where are these array's for and what are those CvMat variables?
// ARRAY AND VECTOR STORAGE:
double M1[3][3], M2[3][3], D1[5], D2[5];
double R[3][3], T[3], E[3][3], F[3][3];
CvMat _M1 = cvMat(3, 3, CV_64F, M1 );
CvMat _M2 = cvMat(3, 3, CV_64F, M2 );
CvMat _D1 = cvMat(1, 5, CV_64F, D1 );
CvMat _D2 = cvMat(1, 5, CV_64F, D2 );
CvMat _R = cvMat(3, 3, CV_64F, R );
CvMat _T = cvMat(3, 1, CV_64F, T );
CvMat _E = cvMat(3, 3, CV_64F, E );
CvMat _F = cvMat(3, 3, CV_64F, F );
In other examples I see this code:
//--------Find and Draw chessboard--------------------------------------------------
if((frame++ % 20) == 0)
{
//----------------CAM1-------------------------------------------------------------------------------------------------------
result1 = cvFindChessboardCorners( frame1, board_sz,&temp1[0], &count1,CV_CALIB_CB_ADAPTIVE_THRESH|CV_CALIB_CB_FILTER_QUADS);
cvCvtColor( frame1, gray_fr1, CV_BGR2GRAY );
What does the if statement exactly do? Why %20?
Thank you in advance!
Update:
I have a two questions about some implementation code: link
-1: Those nx and ny variables that are declared in line 18 and used in the board_sz variable at line 25. Are these nx and ny the rows and columns or the corners in the chessboard pattern? (I think that these are the rows and columns, because cvSize has parameters for width and height).
-2: What are these CvMat variables for (lines 143 - 146)?
CvMat _objectPoints = cvMat(1, N, CV_32FC3, &objectPoints[0] );
CvMat _imagePoints1 = cvMat(1, N, CV_32FC2, &points[0][0] );
CvMat _imagePoints2 = cvMat(1, N, CV_32FC2, &points[1][0] );
CvMat _npoints = cvMat(1, npoints.size(), CV_32S, &npoints[0] );

Each of those matrices has a meaning in epipolar geometry. They describe the relation between your two cameras in 3D space and between the images they record.
In your example, they are:
M1 - the camera intrinsics matrix of your left camera
M2 - the camera intrinsics matrix of your right camera
D1 - the distortion coefficients of your left camera
D2 - the distortion coefficients of your right camera
R - the rotation matrix from the right to your left camera
T - the translation vector from the right to your left camera
E - the essential matrix of your stereo setup
F - the fundamental matrix of your stereo setup
On the basis of these matrices, you can undistort and rectify your images, which allows you to extract the depth of a point you see in both images by way of their disparity (the difference in x, basically). Finding a point in both images is called matching, and is generally the last step after rectification.
Any good introduction to epipolar geometry and stereo vision will probably be better than anything I could type up here. I recommend the Learning OpenCV book from which your example code is taken and which goes into great detail explaining the basics.
The second part of your question has already been answered in a comment:
(frame++ % 20) is 0 for every 20th frame recorded from your webcam, so the code in the if-clause is executed once per 20 frames.
Response to your update:
nx and ny are the number of corners in the chessboard pattern in your calibration images. n a "normal" 8x8 chessboard, nx = ny = 7. You can see that in lines 138-139, the points of one ideal chessboard are created by offsetting nx*ny points with a distance of squareSize, the size of one square in your chessboard.
The CvMat variables "objectPoints", "imagePoints" and "npoints" are passed into the cvStereoCalibrate function.
objectPoints contains the points of your calibration object (the chessboard)
imagePoints1/2 contain these points as seen by each of your cameras
npoints just contains the number of points in each image (as an M-by-1 matrix) - feel free to ignore it, it's not used in the OpenCV C++ API any more anyway.
Basically, cvStereoCalibrate fits the imagePoints to the objectPoints, and returns 1) the distortion coefficients, 2) the intrinsic camera matrices and 3) the spatial relation of the two cameras as the rotation matrix R and translation vector T. The first are used to undistort your images, the second relay your pixel coordinates to real-world coordinates, and the third allow you can rectify your two images.
As a side note: I remember having trouble with the stereo calibration because the chessboard orientation could be detected differently in the left and right camera images. This shouldn't be a problem unless you have a large angle between your cameras (which isn't a great idea) or you incline your chessboards a lot (which isn't necessary), but you can still keep an eye out.

Is the Sobel Filter meant to be normalized?

The x-derivative Sobel looks that way:
-1 0 +1
-2 0 +2
-1 0 +1
Lets say there are two samples of my image which look like that (0=black, 1=white):
0 0 1 1 0 0
0 0 1 & 1 0 0
0 0 1 1 0 0
If I perform convolution I'll end up with 4 and -4 respectively.
So my natural response would be to normalize the result by 8 and translate it by 0.5
- is that correct?
(I am wondering as can't find Wikipedia etc. mentioning any normalization)
EDIT:
I use the Sobel Filter to create a 2D Structure Tensor (with the derivatives dX and dY):
A B
Structure Tensor = C D
with A = dx^2
B = dx*dy
C = dx*dy
D = dy^2
Ultimately I want to store the result in [0,1], but right now I'm just wondering if I have to normalize the Sobel result (by default, not just in order to store it) or not, i.e.:
A = dx*dx
//OR
A = (dx/8.0)*(dx/8.0)
//OR
A = (dx/8.0+0.5)*(dx/8.0+0.5)

The Sobel filter is the composition of a finite difference filter in one dimension:
[ 1 0 -1 ] / 2
and a smoothing filter in the other dimension:
[ 1 2 1 ] / 4
Therefore, the proper normalization to the kernel as typically defined is 1/8.
This normalization is required when a correct estimate of the derivative is needed. When computing the gradient magnitude for detecting edges, the scaling is irrelevant. This is why often the Sobel filter is implemented without the 1/8 normalization.
The 1/4 in the smoothing filter is to normalize it to 1. The 1/2 in the finite difference filter comes from the distance between the two pixels compared. The derivative is defined as the limit of h to zero of [f(x+h)-f(x)]/h. For the finite difference approximation, we can choose h=1, leading to a filter [1,-1], or h=2, leading to the filter above. The advantage with h=2 is that the filter is symmetric, with h=1 you end up computing the derivative in the middle between the two pixels, thus the result is shifted by half a pixel.

A mathematically correct normalization for the Sobel filter is 1/8, because it brings the result to the natural units of one gray-level per pixel. But in practical programming, this isn't necessarily the right thing to do.

The structure tensor is made of sums of products, so the normalization to [0, 1] is not really helpful.
You mainly use the Eigenvalues, possibly comparing them to a threshold. Rather than normalizing all values, it is more efficient to adjust the threshold.
The ratios of the coefficients or of the Eigenvalues are normalization-independent.
Now if you want to store the Sobel components and your pixel data type is unsigned bytes [0, 255], the range of the components will be [-1020, 1020], which you rescale to [0, 255] by adding 1024 and dividing by 8.
If you only need to store the gradient modulus (L∞ norm), the range is [0, 1020] and a division by 4 is enough.
Final remark: the Sobel components are usually small, except along strong edges. So it can make sense, to preserve accuracy, to use a smaller denominator (1, 2 or 4) and clamp the values that are out of range.

Sobel filter is sort of heuristic approach to do differential along horizontally or vertically. Therefore, normalization could be arbitrary. I found following normalization makes more sense than others, which take half of the sum of absolute values.
http://www.imagemagick.org/discourse-server/viewtopic.php?t=14434&start=30
In fact, scikit-image uses this approach. e.g.,
>>>from skimage import filters
>>>import numpy as np
>>>one[:,0] = 2
>>>one
array([[ 2., 1., 1.],
[ 2., 1., 1.],
[ 2., 1., 1.]])
>>>filters.sobel_v(one)
array([[ 0., 0., 0.],
[ 0., -1., 0.],
[ 0., 0., 0.]])

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart