Volume of the camera calibration - opencv

I am dealing with the problem, which concerns the camera calibration. I need calibrated cameras to realize measurements of the 3D objects. I am using OpenCV to carry out the calibration and I am wondering how can I predict or calculate a volume in which the camera is well calibrated. Is there a solution to increase the volume espacially in the direction of the optical axis? Does the procedure, in which I increase the movement range of the calibration target in 'z' direction gives sufficient difference?

I think you confuse a few key things in your question:
Camera calibration - this means finding out the matrices (intrinsic and extrinsic) that describe the camera position, rotation, up vector, distortion, optical center etc. etc.
Epipolar Rectification - this means virtually "rotating" the image planes so that they become coplanar (parallel). This simplifies the stereo reconstruction algorithms.
For camera calibration you do not need to care about any volumes - there aren't volumes where the camera is well calibrated or wrong calibrated. If you use the chessboard pattern calibration, your cameras are either calibrated or not.
When dealing with rectification, you want to know which areas of the rectified images correspond and also maximize these areas. OpenCV allows you to choose between two extremes - either making all pixels in the returned areas valid and cutting out pixels that don't fit into the rectangular area or include all pixels even with invalid ones.
OpenCV documentation has some nice, more detailed descriptions here: http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html

Related

Difference between stereo camera calibration vs two single camera calibrations using OpenCV

I have a vehicle with two cameras, left and right. Is there a difference between me calibrating each camera separately vs me performing "stereo calibration" ? I am asking because I noticed in the OpenCV documentation that there is a stereoCalibrate function, and also a stereo calibration tool for MATLAB. If I do separate camera calibration on each and then perform a depth calculation using the undistorted images of each camera, will the results be the same ?
I am not sure what the difference is between the two methods. I performed normal camera calibration for each camera separately.
For intrinsics, it doesn't matter. The added information ("pair of cameras") might make the calibration a little better though.
Stereo calibration gives you the extrinsics, i.e. transformation matrices between cameras. That's for... stereo vision. If you don't perform stereo calibration, you would lack the extrinsics, and then you can't do any depth estimation at all, because that requires the extrinsics.
TL;DR
You need stereo calibration if you want 3D points.
Long answer
There is a huge difference between single and stereo camera calibration.
The output of single camera calibration are intrinsic parameters only (i.e. the 3x3 camera matrix and a number of distortion coefficients, depending on the model used). In OpenCV this is accomplished by cv2.calibrateCamera. You may check my custom library that helps reducing the boilerplate.
When you do stereo calibration, its output is given by the intrinsics of both cameras and the extrinsic parameters.
In OpenCV this is done with cv2.stereoCalibrate. OpenCV fixes the world origin in the first camera and then you get a rotation matrix R and translation vector t to go from the first camera (origin) to the second one.
So, why do we need extrinsics? If you are using a stereo system for 3D scanning then you need those (and the intrinsics) to do triangulation, so to obtain 3D points in the space: if you know the projection of a general point p in the space on both cameras, then you can calculate its position.
To add something to what #Christoph correctly answered before, the intrinsics should be almost the same, however, cv2.stereoCalibrate may improve the calculation of the intrinsics if the flag CALIB_FIX_INTRINSIC is not set. This happens because the system composed by two cameras and the calibration board is solved as a whole by numerical optimization.

Use EMGU to get "real world" coordinates of pixel values

There are a number of calibration tutorials to calibrate camera images of chessboards in EMGU (OpenCV). They all end up calibrating and then undistorting an image for display. That's cool and all but I need to do machine vision where I am taking an image, identifying the location of a corner or blob or feature in the image and then translating the location of that feature in pixels into real world X, Y coordinates.
Pixel -> mm.
Is this possible with EMGU? If so, how? I'd hate to spend a bunch of time learning EMGU and then not be able to do this crucial function.
Yes, it's certainly possible as the "bread and butter" of OpenCV.
The calibration you are describing, in terms of removing distortions, is a prerequisite to this process. After which, the following applies:
The Intrinsic calibration, or "camera matrix" is the first of two required matrices. The second is the Extrinsic calibration of the camera which is essentially the 6 DoF transform that describes the physical location of the sensor center relative to a coordinate reference frame.
All of the Distortion Coefficients, Intrinsic, and Extrinsic Calibrations are available from a single function in Emgu.CV: CvInvoke.CalibrateCamera This process is best explained, I'm sure, by one of the many tutorials available that you have described.
After that it's as simple as CvInvoke.ProjectPoints to apply the transforms above and produce 3D coordinates from 2D pixel locations.
The key to doing this successfully this providing comprehensive IInputArray objectPoints and IInputArray imagePoints to CvInvoke.CalibrateCamera. Be sure to cause "excitation" by using many images, from many different perspectives.

How to relate detected keypoints after auto-focus

I'm working with an stereo camera setup that has auto-focus (which I cannot turn off) and a really low baseline of less than 1cm.
Auto-focus process can actually change any intrinsic parameter of both cameras (as focal length and principal point, for example) and without a fix relation (left camera may add focus while right one decrease it). Luckily cameras always report the current state of intrinsics with great precision.
On every frame an object of interest is being detected and disparities between camera images are calculated. As baseline is quite low and resolution is not the greatest, performing stereo triangulation leads to quite poor results and for this matter several succeeding computer vision algorithms relay only on image keypoints and disparities.
Now, disparities being calculated between stereo frames cannot be directly related. If principal point changes disparities will be in very different magnitudes after the auto-focus process.
Is there any way to relate keypoint corners and/or disparities between frames after auto-focus process? For example, calculate where would the object lie in the image with the previous intrinsics?
Maybe using a bearing vector towards object and then seek for intersection with image plane defined by previous intrinsics?
Quite challenging your project, perhaps these patents could help you in some way:
Stereo yaw correction using autofocus feedback
Autofocus for stereo images
Depth information for auto focus using two pictures and two-dimensional Gaussian scale space theory

Finding the relative pose between two cameras with 2D and 3D correspondences

I have two images obtained by a calibrated camera from two different poses. I also have correspondences of 2D points between the images. Some of the points have depth information, so I also know their 3D coordinates. I want to calculate the relative pose between the images.
I know I can compute a fundamental matrix or an essential matrix from the 2D points. I also know PnP can find the pose with 2D-to-3D correspondences and that it's also doable getting just correspondences of 3D points. However, I don't know any algorithm that takes advantage of all the available information. Is there any?
There is only one such algorithm: Bundle Adjustment - everything else is a hack. Get your initial estimates separately, use any "reasonable & simple" hacky way of merging them to get an initial estimate, then byte the bullet and bundle. If you are coding in C++, Google's Ceres is my recommended B.A. library.

Minimum number of chessboard images for Stereo Calibration and Rectification

What is the minimum number of chessboard image pairs in order to mathematically calibrate and rectify two cameras ? One pair is considered as a single view of the chessboard by each camera, ending with a left and right image of the same scene. As far as I know we need just one pair for a stereo system, as the stereo calibration seeks the relations between the tow cameras.
Stereo calibration seeks not only the rotation and translation between the two cameras, but also the intrinsic and distortion parameters of each camera. You need at least two images to calibrate each camera separately, just to get the intrinsics. If you have already calibrated each camera separately, then, yes, you can use a single pair of checkerboard images to get R and t. However, you will not get a very good accuracy.
As a rule of thumb, you need 10-20 image pairs. You need enough images to cover the field of view, and to have a good distribution of 3D orientations of the board.
To calibrate a stereo pair of cameras, you first calibrate the two cameras separately, and then you do another joint optimization of the parameters of both cameras plus the rotation and translation between them. So one pair of images will simply not work.
Edit:
The camera calibration algorithm used in OpenCV, Caltech Calibration Toolbox, and the Computer Vision System Toolbox for MATLAB is based on the work by Zhengyou Zhang. His paper explains it better than I ever could.
The crux of the issue here is that the points on the chessboard are co-planar, which is a degenerate configuration. You simply cannot solve for the intrinsics using just one view of a planar board. You need more than one view, with the board in different 3-D orientations. Views where the boards are in parallel planes do not add any information.
"One image with 3 corners give us 6 pieces of information can be used to solve both intrinsic and distortion. "
I think that this is your main error. These corners are not independent. A pattern with a 100x100 chessboard pattern does not provide more information than a 10x10 pattern in your perfect world as the points are on the same plane.
If you have a single view of a chessboard, a closer distance to the board can be compensated by the focus so that you are not (even in your perfect world) able to calibrate your camera's intrinsic AND extrinsic parameters.

Resources