I have a stereo camera rig. I have captured a chessboardPattern sequence (the same sequence, two pictures per exposure). I have performed a single camera calibration on the individual cameras using cv2.calibrateCamera.
My question is, is running a cv2.stereoCalibrate on both cameras redundant, given that calibrateCamera has provided me with object-relative position and orientation of the individual cameras? If not, what benefits does it provide me with?
The intrinsic parameters are generated in both cv2.stereoCalibrate and cv2.calibrateCamera using the same function 'cvCalibrateCamera2', the difference being in cv2.stereoCalibrate you disable can this calculation using the flags.
'No', this functionality is not redundant because the extrinsic parameters are calculated in a different manner. What calibrateCamera does for you as demonstrated in this tutorial is how to find 3D points using a single camera over multiple frames, which is what a stereo camera can do in a single frame (taken by both cameras). In stereoCalibrate, the extrinsic parameters are generated with respect to both the cameras.
Since you already have a stereo rig, use stereoCalibrate to get the intrinsic and extrinsic parameters. This page has information about how to use those parameters to create a depth map.
OpenCV Documentation
Related
I have a vehicle with two cameras, left and right. Is there a difference between me calibrating each camera separately vs me performing "stereo calibration" ? I am asking because I noticed in the OpenCV documentation that there is a stereoCalibrate function, and also a stereo calibration tool for MATLAB. If I do separate camera calibration on each and then perform a depth calculation using the undistorted images of each camera, will the results be the same ?
I am not sure what the difference is between the two methods. I performed normal camera calibration for each camera separately.
For intrinsics, it doesn't matter. The added information ("pair of cameras") might make the calibration a little better though.
Stereo calibration gives you the extrinsics, i.e. transformation matrices between cameras. That's for... stereo vision. If you don't perform stereo calibration, you would lack the extrinsics, and then you can't do any depth estimation at all, because that requires the extrinsics.
TL;DR
You need stereo calibration if you want 3D points.
Long answer
There is a huge difference between single and stereo camera calibration.
The output of single camera calibration are intrinsic parameters only (i.e. the 3x3 camera matrix and a number of distortion coefficients, depending on the model used). In OpenCV this is accomplished by cv2.calibrateCamera. You may check my custom library that helps reducing the boilerplate.
When you do stereo calibration, its output is given by the intrinsics of both cameras and the extrinsic parameters.
In OpenCV this is done with cv2.stereoCalibrate. OpenCV fixes the world origin in the first camera and then you get a rotation matrix R and translation vector t to go from the first camera (origin) to the second one.
So, why do we need extrinsics? If you are using a stereo system for 3D scanning then you need those (and the intrinsics) to do triangulation, so to obtain 3D points in the space: if you know the projection of a general point p in the space on both cameras, then you can calculate its position.
To add something to what #Christoph correctly answered before, the intrinsics should be almost the same, however, cv2.stereoCalibrate may improve the calculation of the intrinsics if the flag CALIB_FIX_INTRINSIC is not set. This happens because the system composed by two cameras and the calibration board is solved as a whole by numerical optimization.
I am trying to create a disparity map of the images created by the stereo camera mounted on the Valve Index VR headset. I am using OpenVR and OpenCV. OpenVR allows access to the cameras using the IVRTrackedCamera interface. In order to perform OpenCV StereoBM the left and right images need to be rectified.
My first question:
OpenVR allows for acquiring frames using GetVideoStreamFrameBuffer(). This method allows for passing an EVRTrackedCameraFrameType, either Distorted, Undistorted or MaximumUndistorted. What do the different FrameTypes mean? Can I assume the frames are already rectified onto a common plane when using the Undistorted or MaxUndistorted frametypes?
Second question:
If the frames are not yet rectified unto a common plane, how to do so? With OpenVR I can get camera intrinsics for each individual camera using GetCameraIntrinsics(), again supplying an EVRTrackedCameraFrameType. I can also acquire the Distortion Parameters for each individual camera using GetArrayTrackedDeviceProperty(Prop_CameraDistortionCoefficients_Float_Array). Now, the two parameters I am missing for OpenCV's stereoRectify() are:
R – Rotation matrix between the coordinate systems of the first and the second cameras.
T – Translation vector between coordinate systems of the cameras.
Is it possible to acquire these parameters from OpenVR?
How important it is to do camera calibration for ArUco? What if I dont calibrate the camera? What if I use calibration data from other camera? Do you need to recalibrate if camera focuses change? What is the practical way of doing calibration for consumer application?
Before answering your questions let me introduce some generic concepts related with camera calibration. A camera is a sensor that captures the 3D world and project it in a 2D image. This is a transformation from 3D to 2D performed by the camera. Following OpenCV doc is a good reference to understand how this process works and the camera parameters involved in the same. You can find detailed AruCo documentation in the following document.
In general, the camera model used by the main libraries is the pinhole model. In the simplified form of this model (without considering radial distortions) the camera transformation is represented using the following equation (from OpenCV docs):
The following image (from OpenCV doc) illustrates the whole projection process:
In summary:
P_im = K・R・T ・P_world
Where:
P_im: 2D points porojected in the image
P_world: 3D point from the world
K is the camera intrinsics matrix (this depends on the camera lenses parameters. Every time you change the camera focus for exapmle the focal distances fx and fy values whitin this matrix change)
R and T are the extrensics of the camera. They represent the rotation and translation matrices for the camera respecively. These are basically the matrices that represent the camera position/orientation in the 3D world.
Now, let's go through your questions one by one:
How important it is to do camera calibration for ArUco?
Camera calibration is important in ArUco (or any other AR library) because you need to know how the camera maps the 3D to 2D world so you can project your augmented objects on the physical world.
What if I dont calibrate the camera?
Camera calibration is the process of obtaining camera parameters: intrinsic and extrinsic parameters. First one are in general fixed and depend on the camera physical parameters unless you change some parameter as the focus for example. In such case you have to re-calculate them. Otherwise, if you are working with camera that has a fixed focal distance then you just have to calculate them once.
Second ones depend on the camera location/orientation in the world. Each time you move the camera the RT matrices change and you have to recalculate them. Here when libraries such as ArUco come handy because using markers you can obtain these values automatically.
In few words, If you don't calculate the camera you won't be able to project objects on the physical world on the exact location (which is essential for AR).
What if I use calibration data from other camera?
It won't work, this is similar as using an uncalibrated camera.
Do you need to recalibrate if camera focuses change?
Yes, you have to recalculate the intrinsic parameters because the focal distance changes in this case.
What is the practical way of doing calibration for consumer application?
It depends on your application, but in general you have to provide some method for manual re-calibration. There're also method for automatic calibration using some 3D pattern.
What is the minimum number of chessboard image pairs in order to mathematically calibrate and rectify two cameras ? One pair is considered as a single view of the chessboard by each camera, ending with a left and right image of the same scene. As far as I know we need just one pair for a stereo system, as the stereo calibration seeks the relations between the tow cameras.
Stereo calibration seeks not only the rotation and translation between the two cameras, but also the intrinsic and distortion parameters of each camera. You need at least two images to calibrate each camera separately, just to get the intrinsics. If you have already calibrated each camera separately, then, yes, you can use a single pair of checkerboard images to get R and t. However, you will not get a very good accuracy.
As a rule of thumb, you need 10-20 image pairs. You need enough images to cover the field of view, and to have a good distribution of 3D orientations of the board.
To calibrate a stereo pair of cameras, you first calibrate the two cameras separately, and then you do another joint optimization of the parameters of both cameras plus the rotation and translation between them. So one pair of images will simply not work.
Edit:
The camera calibration algorithm used in OpenCV, Caltech Calibration Toolbox, and the Computer Vision System Toolbox for MATLAB is based on the work by Zhengyou Zhang. His paper explains it better than I ever could.
The crux of the issue here is that the points on the chessboard are co-planar, which is a degenerate configuration. You simply cannot solve for the intrinsics using just one view of a planar board. You need more than one view, with the board in different 3-D orientations. Views where the boards are in parallel planes do not add any information.
"One image with 3 corners give us 6 pieces of information can be used to solve both intrinsic and distortion. "
I think that this is your main error. These corners are not independent. A pattern with a 100x100 chessboard pattern does not provide more information than a 10x10 pattern in your perfect world as the points are on the same plane.
If you have a single view of a chessboard, a closer distance to the board can be compensated by the focus so that you are not (even in your perfect world) able to calibrate your camera's intrinsic AND extrinsic parameters.
i want to find a position of a point with opencv. i calibrated two cameras using cvCalibrateCamera2. so i know both intrinsic and extrinsic parameters. I read that with a known intrinsic and extrinsic parameters, i can reconstruct 3d by triangulation easily. Is there a function in opencv to achive this.I think cvProjectPoint2 may be useful but i don t understand what exactly. So how i can find 3d position of a point.
Thanks.
You first have to find disparities. There are two algorithms implemented in OpenCV - block matching (cvFindStereoCorrespondenceBM) and graph cuts (cvFindStereoCorrespondenceGC). The latter one gives better results but is slower. After disparity detection you can reproject the disparities to 3D using cvReprojectImageTo3D. This gives you distances for each point of the input images that is in both camera views.
Also note that the stereo correspondence algorithms require a rectified image pair (use cvStereoRectify, cvInitUndistortRectifyMap and cvRemap).