During camera calibration, the usual advice is to use many images (>10) with variations in pose, depth, etc.
However I notice that usually the fewer images I use, the smaller the reprojection error. For example with 27 images, cv::calibrateCamera returns 0.23 and with just 3 I get 0.11
This may be due to the the fact that during calibration we are solving a least squares problem for an overdetermined system.
QUESTIONS:
Do we actually use the reprojection error as an absolute measure of how good a calibration is? For example, if I calibrate with 3 images and get 0.11, and then calibrate with 27 other images and get 0.23 can we really say that "the first calibration is better"?
OpenCV uses the same images both for calibration and for calculating the error. Isn't that some form of overfitting? Wouldn't it be more correct if I actually used 2 different sets -one to compute the calibration parameters and one to compute the error-? In that case, I would use the same (test) set to calculate the error for all my calibration results from different (training) sets. Wouldn't that be more fair?
Sorry if this is too late - only just saw it.
The error is the reprojection of the fit. So find points on an image, calculate the real world model, recalculate where those points would be on the image given the model - report the difference. In a way this is a bit circular, you might have a model that is only correct for those few images which would then report a very good error while giving it lots of images will make a much more generally correct model - but will have a larger error, just because you are trying to stretch it to fit a much bigger space.
There does come a point where adding more images doesn't improve the fit, and may add noise since points are never detected perfectly. What's important is to provide a bigger set of paramters, more angles and positions, rather than equivalent data
Using the same image set to predict the error isn't really a problem because the fit does have a real meaning in terms of actual physical lens parameters - it's not like training/testing a neural net on the same data.
edit: a better calibration routine than opencv (although based on the same concept) is included in 3D-DIC (free but not OSS, register for the site to get the download link) specifically see the calibration manual.
Related
I've spent some time trying to calibrate two similar cameras (ExCam IPQ1715 and ExCam IPQ1765) to varying degrees of success, with the eventual goal of using them for short-range photogrammetry. I've been using a charuco board, along with the OpenCV Charuco calibration library, and have noticed that the quality of my calibration is closely tied to how much of the images is taken up by the board. (I measure calibration quality by RMS reprojection error given by OpenCV, and also by just seeing if the undistorted images appear to have straighter lines on the board than the originals).
I'm still pretty inexperienced, and there have been other factors messing with my calibration (leaving autofocus on, OpenCV charuco identification sometimes getting strange false positives on some images without me noticing), so my question is less about my results and more about best practice for camera calibration in general:
How crucial is it that the board (charuco, chessboard) take up most of the image space? Is there generally a minimum amount that it should cover? Is this even an issue at all, or am I likely mistaking it for another cause of bad calibration?
I've seen lots of calibration tutorials online where the board seems to take up a small portion of the image, but then have also found other people experiencing similar issues. In short, I'm horribly lost.
Any guidance would be awesome - thanks!
Consider the point that camera calibration calculation is a model fitting.
i.e. optimize the model parameters with the measurements.
So... You should pay attention to:
If the board image is too small to see the distortion in the board image,
is it possible to optimize the distortion parameters with such image?
If the pattern image is only distributed near the center of the image,
is it possible to estimate valid parameter values for regions far from the center?
(this will be an extrapolation).
If the pattern distribution is not uniform, the density of the data can affect the results.
e.g. With least square optimization, errors in regions with little data can be neglected.
Therefore, my suggestion is:
Pattern images that are extremely small are useless.
The data should cover the entire field of view of the camera image, and the distribution should be as uniform as possible.
Use enough data. With few data may cause overfitting.
Check the pattern recognition results of all images(sample code often omit this).
i used the opencv sample code for stereo camera calibration to get the intrinsics and extrinsics of my stereo camera. I used 149 image pairs and the program detected 114 image pairs
Result of my Calibration:
..... 114 pairs have been successfully detected.
Running stereo calibration ...
done with RMS error = 1.60208
average epipolar error = 1.15512
i know the error should be below 1 but i only get below 1 of error in small number of image pairs. so im not sure if my result is good or bad.
You should be able to get an error below 1, but it's not so bad. I also do the calibration with around 100 of images. I often got a few images to discard in which the detection was not reliable.
If you decreased the number of images down to 10 images, then the calibration might overfit for these cases. The error would then not be reliable.
In the calibration process, the problems I faced came from the calibration setup. My recommendations are the following:
Check that your calibration pattern is perfectly flat. In my case I printed on adhesive paper and glued it on a piece of glass.
Check that your calibration pattern is not symmetrical in rotation, otherwise the pose estimation could be wrong.
Check the intermediate pattern points detection. There are some examples in opencv to show the corners or circles centers detected points.
The error can be also displayed for each frame. This can help you to understand for which images you have a problem. If you see that these images actually have a detection problem, you can discard them.
If you acquire videos and not images, both cameras should be synchronized with a hardware connection. In my case I cannot have such a link, therefore I built some kind of holder for the calibration target to keep it still, and I acquired only images, not videos.
This won't reduce your calibration error, but use very different pattern positions to cover the maximum of the field of view.
If your depth of field is small and you have blurry images before/after the focus because of that, change from the chessboard pattern to a circles pattern (functions also available in opencv).
If you don't have a strong distortion in your images (e.g. a photo with an iphone doesn't really show a strong fisheye-like distortion), consider forcing K3=0.
In my case, I fixed the "principal point" in the middle of the image, because the algorithm always found crazy values for these parameters, like for K3.
Hope this helps a bit. Good luck!
I took the example of code for calibrating a camera and undistorting images from this book: shop.oreilly.com/product/9780596516130.do
As far as I understood the usual camera calibration methods of OpenCV work perfectly for "normal" cameras.
When it comes to Fisheye-Lenses though we have to use a vector of 8 calibration parameters instead of 5 and also the flag CV_CALIB_RATIONAL_MODEL in the method cvCalibrateCamera2.
At least, that's what it says in the OpenCV documentary
So, when I use this on an array of images like this (Sample images from OCamCalib) I get the following results using cvInitUndistortMap: abload.de/img/rastere4u2w.jpg
Since the resulting images are cut out of the whole undistorted image, I went ahead and used cvInitUndistortRectifyMap (like it's described here stackoverflow.com/questions/8837478/opencv-cvremap-cropping-image). So I got the following results: abload.de/img/rasterxisps.jpg
And now my question is: Why is not the whole image undistorted? In some pics of my later results you can recognize that the laptop for example is still totally distorted. How can I acomplish even better results using the standard OpenCV methods?
I'm new to stackoverflow and I'm new to OpenCV as well, so please excuse any of my shortcomings when it comes to expressing my problems.
All chessboard corners should be visible to be found. The algorithm expect a certain size of chessboard such as 4x3 or 7x6 (for example). The white border around a chess board should be visible too or dark squares may not be defined precisely.
You still have high distortions at the image periphery after undistort() since distortions are radial (that is they increase with the radius) and your found coefficients are wrong. The latter are wrong since a calibration process minimizes the sum of squared errors in pixel coordinates and you did not represent the periphery with enough samples.
TODO: You have to have 20-40 chess board pattern images if you use 8 distCoeff. Slant your boards at different angles, put them at different distances and spread them around, especially at the periphery. Remember, the success of calibration depends on sampling and also on seeing vanishing points clearly from your chess board (hence slanting and tilting).
I am totally new to camera calibration techniques... I am using OpenCV chessboard technique... I am using a webcam from Quantum...
Here are my observations and steps..
I have kept each chess square side = 3.5 cm. It is a 7 x 5 chessboard with 6 x 4 internal corners. I am taking total of 10 images in different views/poses at a distance of 1 to 1.5 m from the webcam.
I am following the C code in Learning OpenCV by Bradski for the calibration.
my code for calibration is
cvCalibrateCamera2(object_points,image_points,point_counts,cvSize(640,480),intrinsic_matrix,distortion_coeffs,NULL,NULL,CV_CALIB_FIX_ASPECT_RATIO);
Before calling this function I am making the first and 2nd element along the diagonal of the intrinsic matrix as one to keep the ratio of focal lengths constant and using CV_CALIB_FIX_ASPECT_RATIO
With the change in distance of the chess board the fx and fy are changing with fx:fy almost equal to 1. there are cx and cy values in order of 200 to 400. the fx and fy are in the order of 300 - 700 when I change the distance.
Presently I have put all the distortion coefficients to zero because I did not get good result including distortion coefficients. My original image looked handsome than the undistorted one!!
Am I doing the calibration correctly?. Should I use any other option than CV_CALIB_FIX_ASPECT_RATIO?. If yes, which one?
Hmm, are you looking for "handsome" or "accurate"?
Camera calibration is one of the very few subjects in computer vision where accuracy can be directly quantified in physical terms, and verified by a physical experiment. And the usual lesson is that (a) your numbers are just as good as the effort (and money) you put into them, and (b) real accuracy (as opposed to imagined) is expensive, so you should figure out in advance what your application really requires in the way of precision.
If you look up the geometrical specs of even very cheap lens/sensor combinations (in the megapixel range and above), it becomes readily apparent that sub-sub-mm calibration accuracy is theoretically achievable within a table-top volume of space. Just work out (from the spec sheet of your camera's sensor) the solid angle spanned by one pixel - you'll be dazzled by the spatial resolution you have within reach of your wallet. However, actually achieving REPEATABLY something near that theoretical accuracy takes work.
Here are some recommendations (from personal experience) for getting a good calibration experience with home-grown equipment.
If your method uses a flat target ("checkerboard" or similar), manufacture a good one. Choose a very flat backing (for the size you mention window glass 5 mm thick or more is excellent, though obviously fragile). Verify its flatness against another edge (or, better, a laser beam). Print the pattern on thick-stock paper that won't stretch too easily. Lay it after printing on the backing before gluing and verify that the square sides are indeed very nearly orthogonal. Cheap ink-jet or laser printers are not designed for rigorous geometrical accuracy, do not trust them blindly. Best practice is to use a professional print shop (even a Kinko's will do a much better job than most home printers). Then attach the pattern very carefully to the backing, using spray-on glue and slowly wiping with soft cloth to avoid bubbles and stretching. Wait for a day or longer for the glue to cure and the glue-paper stress to reach its long-term steady state. Finally measure the corner positions with a good caliper and a magnifier. You may get away with one single number for the "average" square size, but it must be an average of actual measurements, not of hopes-n-prayers. Best practice is to actually use a table of measured positions.
Watch your temperature and humidity changes: paper adsorbs water from the air, the backing dilates and contracts. It is amazing how many articles you can find that report sub-millimeter calibration accuracies without quoting the environment conditions (or the target response to them). Needless to say, they are mostly crap. The lower temperature dilation coefficient of glass compared to common sheet metal is another reason for preferring the former as a backing.
Needless to say, you must disable the auto-focus feature of your camera, if it has one: focusing physically moves one or more pieces of glass inside your lens, thus changing (slightly) the field of view and (usually by a lot) the lens distortion and the principal point.
Place the camera on a stable mount that won't vibrate easily. Focus (and f-stop the lens, if it has an iris) as is needed for the application (not the calibration - the calibration procedure and target must be designed for the app's needs, not the other way around). Do not even think of touching camera or lens afterwards. If at all possible, avoid "complex" lenses - e.g. zoom lenses or very wide angle ones. For example, anamorphic lenses require models much more complex than stock OpenCV makes available.
Take lots of measurements and pictures. You want hundreds of measurements (corners) per image, and tens of images. Where data is concerned, the more the merrier. A 10x10 checkerboard is the absolute minimum I would consider. I normally worked at 20x20.
Span the calibration volume when taking pictures. Ideally you want your measurements to be uniformly distributed in the volume of space you will be working with. Most importantly, make sure to angle the target significantly with respect to the focal axis in some of the pictures - to calibrate the focal length you need to "see" some real perspective foreshortening. For best results use a repeatable mechanical jig to move the target. A good one is a one-axis turntable, which will give you an excellent prior model for the motion of the target.
Minimize vibrations and associated motion blur when taking photos.
Use good lighting. Really. It's amazing how often I see people realize late in the game that you need a generous supply of photons to calibrate a camera :-) Use diffuse ambient lighting, and bounce it off white cards on both sides of the field of view.
Watch what your corner extraction code is doing. Draw the detected corner positions on top of the images (in Matlab or Octave, for example), and judge their quality. Removing outliers early using tight thresholds is better than trusting the robustifier in your bundle adjustment code.
Constrain your model if you can. For example, don't try to estimate the principal point if you don't have a good reason to believe that your lens is significantly off-center w.r.t the image, just fix it at the image center on your first attempt. The principal point location is usually poorly observed, because it is inherently confused with the center of the nonlinear distortion and by the component parallel to the image plane of the target-to-camera's translation. Getting it right requires a carefully designed procedure that yields three or more independent vanishing points of the scene and a very good bracketing of the nonlinear distortion. Similarly, unless you have reason to suspect that the lens focal axis is really tilted w.r.t. the sensor plane, fix at zero the (1,2) component of the camera matrix. Generally speaking, use the simplest model that satisfies your measurements and your application needs (that's Ockam's razor for you).
When you have a calibration solution from your optimizer with low enough RMS error (a few tenths of a pixel, typically, see also Josh's answer below), plot the XY pattern of the residual errors (predicted_xy - measured_xy for each corner in all images) and see if it's a round-ish cloud centered at (0, 0). "Clumps" of outliers or non-roundness of the cloud of residuals are screaming alarm bells that something is very wrong - likely outliers due to bad corner detection or matching, or an inappropriate lens distortion model.
Take extra images to verify the accuracy of the solution - use them to verify that the lens distortion is actually removed, and that the planar homography predicted by the calibrated model actually matches the one recovered from the measured corners.
This is a rather late answer, but for people coming to this from Google:
The correct way to check calibration accuracy is to use the reprojection error provided by OpenCV. I'm not sure why this wasn't mentioned anywhere in the answer or comments, you don't need to calculate this by hand - it's the return value of calibrateCamera. In Python it's the first return value (followed by the camera matrix, etc).
The reprojection error is the RMS error between where the points would be projected using the intrinsic coefficients and where they are in the real image. Typically you should expect an RMS error of less than 0.5px - I can routinely get around 0.1px with machine vision cameras. The reprojection error is used in many computer vision papers, there isn't a significantly easier or more accurate way to determine how good your calibration is.
Unless you have a stereo system, you can only work out where something is in 3D space up to a ray, rather than a point. However, as one can work out the pose of each planar calibration image, it's possible to work out where each chessboard corner should fall on the image sensor. The calibration process (more or less) attempts to work out where these rays fall and minimises the error over all the different calibration images. In Zhang's original paper, and subsequent evaluations, around 10-15 images seems to be sufficient; at this point the error doesn't decrease significantly with the addition of more images.
Other software packages like Matlab will give you error estimates for each individual intrinsic, e.g. focal length, centre of projection. I've been unable to make OpenCV spit out that information, but maybe it's in there somewhere. Camera calibration is now native in Matlab 2014a, but you can still get hold of the camera calibration toolbox which is extremely popular with computer vision users.
http://www.vision.caltech.edu/bouguetj/calib_doc/
Visual inspection is necessary, but not sufficient when dealing with your results. The simplest thing to look for is that straight lines in the world become straight in your undistorted images. Beyond that, it's impossible to really be sure if your cameras are calibrated well just by looking at the output images.
The routine provided by Francesco is good, follow that. I use a shelf board as my plane, with the pattern printed on poster paper. Make sure the images are well exposed - avoid specular reflection! I use a standard 8x6 pattern, I've tried denser patterns but I haven't seen such an improvement in accuracy that it makes a difference.
I think this answer should be sufficient for most people wanting to calibrate a camera - realistically unless you're trying to calibrate something exotic like a Fisheye or you're doing it for educational reasons, OpenCV/Matlab is all you need. Zhang's method is considered good enough that virtually everyone in computer vision research uses it, and most of them either use Bouguet's toolbox or OpenCV.
I'm using the EMGU OpenCV wrapper for c#. I've got a disparity map being created nicely. However for my specific application I only need the disparity values of very few pixels, and I need them in real time. The calculation is taking about 100 ms now, I imagine that by getting disparity for hundreds of pixel values rather than thousands things would speed up considerably. I don't know much about what's going on "under the hood" of the stereo solver code, is there a way to speed things up by only calculating the disparity for the pixels that I need?
First of all, you fail to mention what you are really trying to accomplish, and moreover, what algorithm you are using. E.g. StereoGC is a really slow (i.e. not real-time), but usually far more accurate) compared to both StereoSGBM and StereoBM. Those last two can be used real-time, providing a few conditions are met:
The size of the input images is reasonably small;
You are not using an extravagant set of parameters (for instance, a larger value for numberOfDisparities will increase computation time).
Don't expect miracles when it comes to accuracy though.
Apart from that, there is the issue of "just a few pixels". As far as I understand, the algorithms implemented in OpenCV usually rely on information from more than 1 pixel to determine the disparity value. E.g. it needs a neighborhood to detect which pixel from image A map to which pixel in image B. As a result, in general it is not possible to just discard every other pixel of the image (by the way, if you already know the locations in both images, you would not need the stereo methods at all). So unless you can discard a large border of your input images for which you know that you'll never find your pixels of interest there, I'd say the answer to this part of your question would be "no".
If you happen to know that your pixels of interest will always be within a certain rectangle of the input images, you can specify the input image ROIs (regions of interest) to this rectangle. Assuming OpenCV does not contain a bug here this should speedup the computation a little.
With a bit of googling you can to find real-time examples of finding stereo correspondences using EmguCV (or plain OpenCV) using the GPU on Youtube. Maybe this could help you.
Disclaimer: this may have been a more complete answer if your question contained more detail.