So, I have a stereo camera with left and right cameras that are already calibrated. Since the precision of stereo vision highly depends on the calibration, it would be useful if the system can detect whether itself is slightly out of calibration, e.g, due to temperature change or mechanical shock that changes the baseline/rotation of the two cameras slightly
So my thought is for every new image pair taken by the stereo camera, the software try to find matching points between the two images, and recalculate the fundamental matrix to see if there is a big shift. However, finding matching points is error prone, especially when no constrains applied
My question is: since I know there should be just a slight shift of the calibration, is there a way to leverage the original calibration to enable a relaxed epipolar constrains on finding the matching points between the two images? maybe as well as a disparity constrain. e.g., I use the original calibration to calculate the distance of the feature points, and I roughly know the disparity will still be within a certain range even the calibration shifted. With such assumptions, I believe I can effectively avoid mismatched points between left and right images, therefore ensure my new fundamental matrix calculation.
So I wonder is there a convenient way to relax the epipolar constrain by a few pixels, and also specify a numDisparities for feature point matching? Or maybe there is a better way to do similar things.
Related
I am trying to reconstruct the real-world coordinates of 3D points from two images taken from the same camera. The camera is not calibrated, but the movement (translation and rotation) is known. In short:
Requirement:
No calibration
Extra constraints other than image point correspondences:
Known camera translation and rotation
Same camera used in all views
I understand that, from image point correspondences alone, a scene can be reconstructed only up to a projective transformation. With more constraints, an affine or similarity reconstruction may be done. In my case, I need a similarity reconstruction.
Given the above constraints, is a similarity reconstruction possible? If possible, how should I go about doing it?
I have tried to attack the problem from a few angles. Since I am not mathematically fluent, I try to use opencv as much as possible.
findFundamentalMat() from the two images, hopefully extract the two camera matrices somehow, then triangulatePoints(). As you could have guessed, I got stuck in the middle, unable to obtain camera matrices from fundamental matrix.
The textbook "Multiple View Geometry in Computer Vision" (by Hartley and Zisserman) gives an expression (p.256, Result 9.14) that expresses the camera matrices in terms of fundamental matrix and one of the epipoles. However, without knowing the camera's intrinsic parameters (requirement: no calibration), I don't see how I can get the epipole.
I also try to treat my problem as a stereo system and use opencv's stereo*** functions. But they all seem to require human intervention to calibrate, which violates my requirement.
So, that's why I ask the question here today. The key is still, given those extra constraints, is a similarity reconstruction possible? I am not smart enough to understand the wealth of knowledge out there, and not able to come up with my own solution. Any help is appreciated.
I am trying to calibrate a camera using a checkerboard by the well known Zhang's method followed by bundle adjustment, which is available in both Matlab and OpenCV. There are a lot of empirical guidelines but from my personal experience the accuracy is pretty random. It could sometimes be really good but also sometimes really bad. The result actually can vary quite a bit just by simply placing the checkerboard at different locations. Suppose the target camera is rectilinear with 110 degree horizontal FOV.
Does the number of squares in the checkerboard affect the accuracy? Zhang uses 8x8 in his original paper without really explaining why.
Does the length of the square affect the accuracy? Zhang uses 17cm x 17cm without really explaining why.
What is the optimal number of snap shots of different checkerboard position/orientation? Zhang uses 5 images only. I saw people suggesting 20~30 images with checkerboards at various angles, fills the entire field of view, tilted to the left, right, top and bottom, and suggested there should be no checkerboard placed at similar position/orientation otherwise the result will be biased towards that position/orientation. Is this correct?
The goal is to figure out a workflow to get consistent calibration result.
If the accuracy you get is "pretty random" then you are likely not doing it right: with stable optics and a well conducted procedure you should consistently be getting RMS projection errors within a few tenths of a pixel. Whether this corresponds to variances of millimeters or meters in 3D space depends, of course, on your optics and sensor resolution (calibration is not a way around physics).
I wrote time ago a few suggestions in this answer, and I recommend you follow them. In particular, pay attention to locking the focus distance (I have seen & heard countless people trying to calibrate a camera on autofocus, and be sorely disappointed). As for the size of the target, again it depends on your optics and camera resolution, but generally speaking the goals are (1) to fill with measurements both the field of view and the volume of space you'll be working with, and (2) to observe significant perspective foreshortening, because that is what constrains the solution for the FOV. Good luck!
[Ed.to address a comment]
Concerning variations on the parameter values across successive calibrations, the first thing I'd do is calculate the cross RMS errors, i.e. the RMS error on dataset 1 with the camera calibrated on dataset 2, and vice versa. If either is significantly higher than the calibration errors, it's an indication that the camera has changed between the two calibrations and so all odds are off. Do you have auto-{focus,iris,zoom,stabilization} on? Turn them all off: auto-anything is the bane of calibration, with the only exception of exposure time. Otherwise, you need to see if the variations you observe on the parameters are actually meaningful (hint, they often are not). A variation of the focal length in pixels of several parts per thousand is probably irrelevant with today's sensor resolutions - you can verify that by expressing it in mm, and comparing it to the dot pitch of the sensor. Also, variations of the position of the principal point in the order of tens of pixels are common, since it is poorly observed unless your calibration procedure is very carefully designed to estimate it.
Ideally you want to place your checkerboard at roughly the same distance from the the camera, as the distance at which you want to do your measurements. So your checkerboard squares must be large enough to be resolvable from that distance. You also do want to cover the entire field of view with points, especially close to the edges and corners of the frame. Also, the smaller the board is, the more images you should take to cover the entire field of view. So 20-30 images is usually a good rule of thumb.
Another thing is that the checkerboard should be asymmetric. Ideally, you want to have an even number of squares along one side, and an odd number of squares along the other. This way the board's in-plane orientation is unambiguous.
Also, I would suggest that you try the Camera Calibrator app in MATLAB. At the very least, look at the documentation, which has a lot of useful suggestions for calibrating cameras.
Is there any particular reason why we need multiple poses (e.g. varying z or rotation) to obtain the focal length and principal point for the camera matrix? In other words, is it sufficient to calibrate a pinhole camera with a single pose? i.e. by keeping the location of the calibration object (let's say a standard checkerboard) constant?
I assume you are asking in the context of OpenCV-like camera calibration using images of a planar target. The reference for the algorithm used by OpenCV is Z. Zhang's now classic paper . The discussion in the top half of page 6 shows that n >= 3 images are necessary for calibrating all 5 parameters of a pinhole camera matrix. Imposing constraints on the parameters reduces the number of needed images to a theoretical minimum of one.
In practice you need more for various reasons, among them:
The need to have enough measurements to overcome "noise" and "random" corner detection errors, while using a practical target with well-separated corners.
The difference between measuring data and observing (constraining) model parameters.
Practical limitations of physical lenses, e.g. depth of field.
As an example for the second point, the ideal target pose for calibrating the nonlinear lens distortion (barrel, pincushion, tangential, etc.) is frontal-facing, covering the whole field of view, because it produces a large number of well-separated and aligned corners over the image, all with approximately the same degree of blur. However, this is exactly the worst pose you can use in order to estimate the field of view / focal length, as for that purpose you need to observe significant perspective foreshortening.
Likewise, it is possible to show that the location of the principal point is well constrained by a set of images showing the vanishing points of multiple pencils of parallel lines. This is important because that location is inherently confused by the component parallel to the image plane of the relative motion between camera and target. Thus the vanishing points help "guide" the optimizer's solution toward the correct one, in the common case where the target does translate w.r.t the camera.
I am totally new to camera calibration techniques... I am using OpenCV chessboard technique... I am using a webcam from Quantum...
Here are my observations and steps..
I have kept each chess square side = 3.5 cm. It is a 7 x 5 chessboard with 6 x 4 internal corners. I am taking total of 10 images in different views/poses at a distance of 1 to 1.5 m from the webcam.
I am following the C code in Learning OpenCV by Bradski for the calibration.
my code for calibration is
cvCalibrateCamera2(object_points,image_points,point_counts,cvSize(640,480),intrinsic_matrix,distortion_coeffs,NULL,NULL,CV_CALIB_FIX_ASPECT_RATIO);
Before calling this function I am making the first and 2nd element along the diagonal of the intrinsic matrix as one to keep the ratio of focal lengths constant and using CV_CALIB_FIX_ASPECT_RATIO
With the change in distance of the chess board the fx and fy are changing with fx:fy almost equal to 1. there are cx and cy values in order of 200 to 400. the fx and fy are in the order of 300 - 700 when I change the distance.
Presently I have put all the distortion coefficients to zero because I did not get good result including distortion coefficients. My original image looked handsome than the undistorted one!!
Am I doing the calibration correctly?. Should I use any other option than CV_CALIB_FIX_ASPECT_RATIO?. If yes, which one?
Hmm, are you looking for "handsome" or "accurate"?
Camera calibration is one of the very few subjects in computer vision where accuracy can be directly quantified in physical terms, and verified by a physical experiment. And the usual lesson is that (a) your numbers are just as good as the effort (and money) you put into them, and (b) real accuracy (as opposed to imagined) is expensive, so you should figure out in advance what your application really requires in the way of precision.
If you look up the geometrical specs of even very cheap lens/sensor combinations (in the megapixel range and above), it becomes readily apparent that sub-sub-mm calibration accuracy is theoretically achievable within a table-top volume of space. Just work out (from the spec sheet of your camera's sensor) the solid angle spanned by one pixel - you'll be dazzled by the spatial resolution you have within reach of your wallet. However, actually achieving REPEATABLY something near that theoretical accuracy takes work.
Here are some recommendations (from personal experience) for getting a good calibration experience with home-grown equipment.
If your method uses a flat target ("checkerboard" or similar), manufacture a good one. Choose a very flat backing (for the size you mention window glass 5 mm thick or more is excellent, though obviously fragile). Verify its flatness against another edge (or, better, a laser beam). Print the pattern on thick-stock paper that won't stretch too easily. Lay it after printing on the backing before gluing and verify that the square sides are indeed very nearly orthogonal. Cheap ink-jet or laser printers are not designed for rigorous geometrical accuracy, do not trust them blindly. Best practice is to use a professional print shop (even a Kinko's will do a much better job than most home printers). Then attach the pattern very carefully to the backing, using spray-on glue and slowly wiping with soft cloth to avoid bubbles and stretching. Wait for a day or longer for the glue to cure and the glue-paper stress to reach its long-term steady state. Finally measure the corner positions with a good caliper and a magnifier. You may get away with one single number for the "average" square size, but it must be an average of actual measurements, not of hopes-n-prayers. Best practice is to actually use a table of measured positions.
Watch your temperature and humidity changes: paper adsorbs water from the air, the backing dilates and contracts. It is amazing how many articles you can find that report sub-millimeter calibration accuracies without quoting the environment conditions (or the target response to them). Needless to say, they are mostly crap. The lower temperature dilation coefficient of glass compared to common sheet metal is another reason for preferring the former as a backing.
Needless to say, you must disable the auto-focus feature of your camera, if it has one: focusing physically moves one or more pieces of glass inside your lens, thus changing (slightly) the field of view and (usually by a lot) the lens distortion and the principal point.
Place the camera on a stable mount that won't vibrate easily. Focus (and f-stop the lens, if it has an iris) as is needed for the application (not the calibration - the calibration procedure and target must be designed for the app's needs, not the other way around). Do not even think of touching camera or lens afterwards. If at all possible, avoid "complex" lenses - e.g. zoom lenses or very wide angle ones. For example, anamorphic lenses require models much more complex than stock OpenCV makes available.
Take lots of measurements and pictures. You want hundreds of measurements (corners) per image, and tens of images. Where data is concerned, the more the merrier. A 10x10 checkerboard is the absolute minimum I would consider. I normally worked at 20x20.
Span the calibration volume when taking pictures. Ideally you want your measurements to be uniformly distributed in the volume of space you will be working with. Most importantly, make sure to angle the target significantly with respect to the focal axis in some of the pictures - to calibrate the focal length you need to "see" some real perspective foreshortening. For best results use a repeatable mechanical jig to move the target. A good one is a one-axis turntable, which will give you an excellent prior model for the motion of the target.
Minimize vibrations and associated motion blur when taking photos.
Use good lighting. Really. It's amazing how often I see people realize late in the game that you need a generous supply of photons to calibrate a camera :-) Use diffuse ambient lighting, and bounce it off white cards on both sides of the field of view.
Watch what your corner extraction code is doing. Draw the detected corner positions on top of the images (in Matlab or Octave, for example), and judge their quality. Removing outliers early using tight thresholds is better than trusting the robustifier in your bundle adjustment code.
Constrain your model if you can. For example, don't try to estimate the principal point if you don't have a good reason to believe that your lens is significantly off-center w.r.t the image, just fix it at the image center on your first attempt. The principal point location is usually poorly observed, because it is inherently confused with the center of the nonlinear distortion and by the component parallel to the image plane of the target-to-camera's translation. Getting it right requires a carefully designed procedure that yields three or more independent vanishing points of the scene and a very good bracketing of the nonlinear distortion. Similarly, unless you have reason to suspect that the lens focal axis is really tilted w.r.t. the sensor plane, fix at zero the (1,2) component of the camera matrix. Generally speaking, use the simplest model that satisfies your measurements and your application needs (that's Ockam's razor for you).
When you have a calibration solution from your optimizer with low enough RMS error (a few tenths of a pixel, typically, see also Josh's answer below), plot the XY pattern of the residual errors (predicted_xy - measured_xy for each corner in all images) and see if it's a round-ish cloud centered at (0, 0). "Clumps" of outliers or non-roundness of the cloud of residuals are screaming alarm bells that something is very wrong - likely outliers due to bad corner detection or matching, or an inappropriate lens distortion model.
Take extra images to verify the accuracy of the solution - use them to verify that the lens distortion is actually removed, and that the planar homography predicted by the calibrated model actually matches the one recovered from the measured corners.
This is a rather late answer, but for people coming to this from Google:
The correct way to check calibration accuracy is to use the reprojection error provided by OpenCV. I'm not sure why this wasn't mentioned anywhere in the answer or comments, you don't need to calculate this by hand - it's the return value of calibrateCamera. In Python it's the first return value (followed by the camera matrix, etc).
The reprojection error is the RMS error between where the points would be projected using the intrinsic coefficients and where they are in the real image. Typically you should expect an RMS error of less than 0.5px - I can routinely get around 0.1px with machine vision cameras. The reprojection error is used in many computer vision papers, there isn't a significantly easier or more accurate way to determine how good your calibration is.
Unless you have a stereo system, you can only work out where something is in 3D space up to a ray, rather than a point. However, as one can work out the pose of each planar calibration image, it's possible to work out where each chessboard corner should fall on the image sensor. The calibration process (more or less) attempts to work out where these rays fall and minimises the error over all the different calibration images. In Zhang's original paper, and subsequent evaluations, around 10-15 images seems to be sufficient; at this point the error doesn't decrease significantly with the addition of more images.
Other software packages like Matlab will give you error estimates for each individual intrinsic, e.g. focal length, centre of projection. I've been unable to make OpenCV spit out that information, but maybe it's in there somewhere. Camera calibration is now native in Matlab 2014a, but you can still get hold of the camera calibration toolbox which is extremely popular with computer vision users.
http://www.vision.caltech.edu/bouguetj/calib_doc/
Visual inspection is necessary, but not sufficient when dealing with your results. The simplest thing to look for is that straight lines in the world become straight in your undistorted images. Beyond that, it's impossible to really be sure if your cameras are calibrated well just by looking at the output images.
The routine provided by Francesco is good, follow that. I use a shelf board as my plane, with the pattern printed on poster paper. Make sure the images are well exposed - avoid specular reflection! I use a standard 8x6 pattern, I've tried denser patterns but I haven't seen such an improvement in accuracy that it makes a difference.
I think this answer should be sufficient for most people wanting to calibrate a camera - realistically unless you're trying to calibrate something exotic like a Fisheye or you're doing it for educational reasons, OpenCV/Matlab is all you need. Zhang's method is considered good enough that virtually everyone in computer vision research uses it, and most of them either use Bouguet's toolbox or OpenCV.
I am working with a set of calibrated images that form a ring around a foreground object (1). I used Fusiello's method (1) to rectify adjacent pairs of images, and then I performed disparity estimation.
When I take the matched points from a stereo pair and triangulate them, it forms an accurate point cloud. Unfortunately, when I triangulate the points from another stereo image pair, this point cloud never aligns correctly with the original cloud.
Should calibrated, rectified images' point clouds merge together automatically?
Thanks in advance for any help you can offer.
This might be due to the accuracy of calibration - both intrinsic (i.e. the same camera model - and how it handles distortion) and extrinsic (i.e. the camera pose in real space). Together, of course, these dictate the ultimate accuracy of your re-projection.
Do you have a measure of error for camera calibration - in terms of MSE re-projection?
Cumulative error is often noticeable in my experience if simply iterating over subsequent images. Some form of global optimisation often needs to be performed to first correct positions for all the camera poses.
The accuracy of your disparity estimation is also a factor. Not only in terms of the algorithm you using, but also in relation to the stereo baseline and how it relates to the size/nature of the object in question (how concave/convex), and how many sampling of the images you are taking (and the quality of those images - exposure/depth-of-field/etc).
Fundamentally, just how "off" are your point clouds? Are they close to being aligned (you could do a bit of ICP before triangulation...). Are they closer in the "centre" of the re-projection? Are they worse for projections taken from opposing images on opposite sides of the object?
Remember as well that (due to the discrete sampling) you shouldn't expect points to ever be re-projected exactly "on-top" on one another. Some form of binning operation during the triangulation pipeline usually occurs for handling this (hence most of the research work in visual hull -> voxels -> marching cubes -> triangulated surface around this...)
Have you checked out MeshLab BTW?