I want to compute the extrinsic calibration of two cameras w.r.t each other and am using cv::stereoCalibrate() function to do this. However, the result does not correspond to the reality. What could be wrong ?
Setup: Two cameras mounted 7 meters high, facing each other while looking downwards. They have lot of field of view intersection and I captured checkerboard images that I used in calibration.
I am not flipping any of the images.
Do I need to flip the images ? or do I need to do something else to tell that the cameras are actually facing each other ?
Note: The same function perfectly calibrates cameras that are next to each other facing in the same direction (like any typical stereo camera).
Thanks
In order to "tell that the cameras are actually facing each other" you have to specify imagePoints1 and imagePoints2 correctly, such that points with matching indices correspond to a same physical point.
If in your case function works perfectly when the cameras are oriented in the same direction and doesn't work with your configuration - discrepancy between point indexing might be a probable reason (most likely points are flipped both vertically and horizontally).
One way to debug this is to either draw indices near the points on each of the frames, or color-code them and make sure they match between the images.
One question though - why do you use cv::stereoCalibrate()? The setting you described doesn't seem to be a good use-case for it. If you want to estimate extrinsic parameters of cameras you can use cv::calibrateCamera(). The only downside is that it assumes that intrinsic parameters are same for all provided views (all images were taken with same or very similar cameras). If it is not the case - indeed cv::stereoCalibrate() would be a better fit (but the manual suggests that you still estimate each camera intrinsic parameters individually using cv::calibrateCamera())
Related
So, I have a stereo camera with left and right cameras that are already calibrated. Since the precision of stereo vision highly depends on the calibration, it would be useful if the system can detect whether itself is slightly out of calibration, e.g, due to temperature change or mechanical shock that changes the baseline/rotation of the two cameras slightly
So my thought is for every new image pair taken by the stereo camera, the software try to find matching points between the two images, and recalculate the fundamental matrix to see if there is a big shift. However, finding matching points is error prone, especially when no constrains applied
My question is: since I know there should be just a slight shift of the calibration, is there a way to leverage the original calibration to enable a relaxed epipolar constrains on finding the matching points between the two images? maybe as well as a disparity constrain. e.g., I use the original calibration to calculate the distance of the feature points, and I roughly know the disparity will still be within a certain range even the calibration shifted. With such assumptions, I believe I can effectively avoid mismatched points between left and right images, therefore ensure my new fundamental matrix calculation.
So I wonder is there a convenient way to relax the epipolar constrain by a few pixels, and also specify a numDisparities for feature point matching? Or maybe there is a better way to do similar things.
I'm currently working on an augmented reality application using a medical imaging program called 3DSlicer. My application runs as a module within the Slicer environment and is meant to provide the tools necessary to use an external tracking system to augment a camera feed displayed within Slicer.
Currently, everything is configured properly so that all that I have left to do is automate the calculation of the camera's extrinsic matrix, which I decided to do using OpenCV's solvePnP() function. Unfortunately this has been giving me some difficulty as I am not acquiring the correct results.
My tracking system is configured as follows:
The optical tracker is mounted in such a way that the entire scene can be viewed.
Tracked markers are rigidly attached to a pointer tool, the camera, and a model that we have acquired a virtual representation for.
The pointer tool's tip was registered using a pivot calibration. This means that any values recorded using the pointer indicate the position of the pointer's tip.
Both the model and the pointer have 3D virtual representations that augment a live video feed as seen below.
The pointer and camera (Referred to as C from hereon) markers each return a homogeneous transform that describes their position relative to the marker attached to the model (Referred to as M from hereon). The model's marker, being the origin, does not return any transformation.
I obtained two sets of points, one 2D and one 3D. The 2D points are the coordinates of a chessboard's corners in pixel coordinates while the 3D points are the corresponding world coordinates of those same corners relative to M. These were recorded using openCV's detectChessboardCorners() function for the 2 dimensional points and the pointer for the 3 dimensional. I then transformed the 3D points from M space to C space by multiplying them by C inverse. This was done as the solvePnP() function requires that 3D points be described relative to the world coordinate system of the camera, which in this case is C, not M.
Once all of this was done, I passed in the point sets into solvePnp(). The transformation I got was completely incorrect, though. I am honestly at a loss for what I did wrong. Adding to my confusion is the fact that OpenCV uses a different coordinate format from OpenGL, which is what 3DSlicer is based on. If anyone can provide some assistance in this matter I would be exceptionally grateful.
Also if anything is unclear, please don't hesitate to ask. This is a pretty big project so it was hard for me to distill everything to just the issue at hand. I'm wholly expecting that things might get a little confusing for anyone reading this.
Thank you!
UPDATE #1: It turns out I'm a giant idiot. I recorded colinear points only because I was too impatient to record the entire checkerboard. Of course this meant that there were nearly infinite solutions to the least squares regression as I only locked the solution to 2 dimensions! My values are much closer to my ground truth now, and in fact the rotational columns seem correct except that they're all completely out of order. I'm not sure what could cause that, but it seems that my rotation matrix was mirrored across the center column. In addition to that, my translation components are negative when they should be positive, although their magnitudes seem to be correct. So now I've basically got all the right values in all the wrong order.
Mirror/rotational ambiguity.
You basically need to reorient your coordinate frames by imposing the constraints that (1) the scene is in front of the camera and (2) the checkerboard axes are oriented as you expect them to be. This boils down to multiplying your calibrated transform for an appropriate ("hand-built") rotation and/or mirroring.
The basic problems is that the calibration target you are using - even when all the corners are seen, has at least a 180^ deg rotational ambiguity unless color information is used. If some corners are missed things can get even weirder.
You can often use prior info about the camera orientation w.r.t. the scene to resolve this kind of ambiguities, as I was suggesting above. However, in more dynamical situation, of if a further degree of automation is needed in situations in which the target may be only partially visible, you'd be much better off using a target in which each small chunk of corners can be individually identified. My favorite is Matsunaga and Kanatani's "2D barcode" one, which uses sequences of square lengths with unique crossratios. See the paper here.
I want to find the depth map for stereo images.At present i am working on the internet image,I want to take stereo images so that i can work on it by my own.How to take best stereo images without much noise.I have single camera.IS it necessary to do rectification?How much distance must be kept between the cameras?
Not sure I've understood your problem correclty - will try anyway
I guess your currently working with images from middlebury or something similar. If you want to use similar algorithms you have to rectify your images because they are based on the assumption that corresponding pixels are on the same line in all images. If you actually want depth images (!= disparity images) you also need to get the camera extrinsics.
Your setup should have two cameras and you have to make sure that they don't change there relative position/orientation - otherwise your rectification will break apart. In the first step you have to calibrate your system to get intrinsic and extrinsic camera parameters. For that you can either use some tool or roll your own with (for example) OpenCV (calib-module). Print out a calibration board to calibrate your system. Afterwards you can take images and use the calibration to rectify the images.
Regarding color-noise:
You could make your aperture very small and use high exposure times. In my own opinion this is useless because real world situations have to deal with such things anyway.
In short, there are plenty of stereo images on the internet that are already rectified. If you want to take your own stereo images you have to follow these three steps:
The relationship between distance to the object z (mm) and disparity in pixels D is inverse: z=fb/D, where f is focal length in pixels and b is camera separation in mm. Select b such that you have at least several pixels of disparity;
If you know camera intrinsic matrix and compensated for radial distortions you still have to rectify your images in order to ensure that matches are located in the same row. For this you need to find a fundamental matrix, recover essential matrix, apply rectifying homographies and update your intrinsic camera parameters... or use stereo pairs from the Internet.
The low level of noise in the camera image is helped by brightly illuminated scenes, large aperture, large pixel size, etc.; however, depending on your set up you still can end up with a very noisy disparity map. The way to reduce this noise is to trade-off with accuracy and use larger correlation windows. Another way to clean up a disparity map is to use various validation techniques such as
error validation;
uniqueness validation or back-and-force validation
blob-noise supression, etc.
In my experience:
-I did the rectification, so I had to obtain the fundamental matrix, and this may not be correct with some image pairs.
-Better resolution of your camera is better for the matching, I use OpenCV and it has an implementation of BRISK descriptor, it was useful for me.
-Try to cover the same area and try not to do unnecessary rotations.
-Once you understand the Theory, OpenCV is a good friend. Here is some result, but I am still working on it:
Depth map:
Rectified images:
I'm currently working on a project that deals with the reconstruction based on a set of images, in a multi-view stereo approach. As such I need to know the several images pose in space. I find matching features using surf, and from the correspondences I find the essential matrix.
Now comes the problem: It is possible to decompose the essential matrix with SVD, but this can lead to 4 different results, as I read in a book. How can I obtain the correct one, assuming this is possible?
What other algorithms can I use for this?
Wikipedia says:
It turns out, however, that only one of the four classes of solutions
can be realized in practice. Given a pair of corresponding image
coordinates, three of the solutions will always produce a 3D point
which lies behind at least one of the two cameras and therefore cannot
be seen. Only one of the four classes will consistently produce 3D
points which are in front of both cameras. This must then be the
correct solution.
If you have the extrinsic calibration parameters for the camera in the first frame, or if you assume that it lies at a default calibration, say translation of (0,0,0) and rotation of (0,0,0), then you can determine which of the decompositions is the valid one.
Thanks to Zaphod answer I was able to solve my problem. Here's what I did:
First I calculated the Essential Matrix (E) from a set of point correspondences in both images.
Using SVD, decomposed it into 2 solutions. Using the negated Essential Matrix -E (which also satisfies the same constraints) I arrived at 2 more solutions for a total of 4 possible camera positions and orientations.
Then, for all solutions I triangulated the point correspondences and determined which intersected in front of both cameras, by taking the dot product of the point coordinate and each of the cameras viewing direction. I both are positive, then that intersection is in front of both cameras.
In the end the solution that delivers the most intersections in front of the cameras is the chosen one.
Does stereo calibration still work if the right image is scaled a bit different than the left, or vice versa?
No, for two reasons:
The triangulation of the 3D point will be affected
Your correspondences will be inaccurate if you are using scale-variant interest point.
Yes, stereo calibration can still work if you have two different images. You have to make sure the calibration takes the difference into account (so the default OpenCV version won't work) and for best results you should try to make sure the cameras are synchronized.
It will be less accurate (more correspondence errors as Jacob notes
The Field of View of the stereo pair will be restricted to the smaller of the images, and than just to the overlapping area between the two images.
You will probably have to write your own calibration and rectification code. I'm not aware of any libraries that can do it.