If I have already computed the intrinsics(Camera matrix and distortion coefficients) in the lab.
I have then moved the cameras out in the real world field.
I used about 6-10 known locations in the real world, to estimate the camera pose using SolvePnP(). So I also have the two cameras rotation and translation.
I now want to use the 2 cameras to create a stereoCorrespondence.
Question is:
Do I have to use stereoCalibrate()?
Or can I call StereoRectify right away, using the already known intrinsics?
It says StereoRectify() expects a rotation and translation vector/matrix,
The documation says it expects:
"The rotation matrix between the 1st and the 2nd cameras’ coordinate systems."
Since I have the camera pose of both cameras, can I simply subtract the 2 translation-vectors and rotation vectors I got from SolvePnP, and pass the result to StereoRectify()?
(Both cameras use the same common object-point reference system)
Calibrating the cameras to the world (e.g. your known locations) is different from calibrating them in respect with each other. In your case you can subtract the translation vectors. This will give you the translation from one camera to the other (provided you calibrated them with the same fixed point). You can also obtain the inter-camera rotation matrix, but this can't be done by simply subtracting them, you need more complicated math. That's why I'd advise you to use stereo calibration provided by opencv.
Related
I have a vehicle with two cameras, left and right. Is there a difference between me calibrating each camera separately vs me performing "stereo calibration" ? I am asking because I noticed in the OpenCV documentation that there is a stereoCalibrate function, and also a stereo calibration tool for MATLAB. If I do separate camera calibration on each and then perform a depth calculation using the undistorted images of each camera, will the results be the same ?
I am not sure what the difference is between the two methods. I performed normal camera calibration for each camera separately.
For intrinsics, it doesn't matter. The added information ("pair of cameras") might make the calibration a little better though.
Stereo calibration gives you the extrinsics, i.e. transformation matrices between cameras. That's for... stereo vision. If you don't perform stereo calibration, you would lack the extrinsics, and then you can't do any depth estimation at all, because that requires the extrinsics.
TL;DR
You need stereo calibration if you want 3D points.
Long answer
There is a huge difference between single and stereo camera calibration.
The output of single camera calibration are intrinsic parameters only (i.e. the 3x3 camera matrix and a number of distortion coefficients, depending on the model used). In OpenCV this is accomplished by cv2.calibrateCamera. You may check my custom library that helps reducing the boilerplate.
When you do stereo calibration, its output is given by the intrinsics of both cameras and the extrinsic parameters.
In OpenCV this is done with cv2.stereoCalibrate. OpenCV fixes the world origin in the first camera and then you get a rotation matrix R and translation vector t to go from the first camera (origin) to the second one.
So, why do we need extrinsics? If you are using a stereo system for 3D scanning then you need those (and the intrinsics) to do triangulation, so to obtain 3D points in the space: if you know the projection of a general point p in the space on both cameras, then you can calculate its position.
To add something to what #Christoph correctly answered before, the intrinsics should be almost the same, however, cv2.stereoCalibrate may improve the calculation of the intrinsics if the flag CALIB_FIX_INTRINSIC is not set. This happens because the system composed by two cameras and the calibration board is solved as a whole by numerical optimization.
I want to get the extrinsic parameters of two cameras looking at the same view. For this I followed the procedure laid out in several textbooks, lectures, etc.
Compute matches in both images using SIFT.
Computed the essential matrix using OpenCV cv2.findEssentialMat.
Recovered the correct R and t from the four solutions using cv2.recoverPose().
From my understanding the translation is up to a scale. What do I have to do to get the absolute translations. I do not have any known objects in the scene, maybe I will have lane lines in the scene, is there a way to use the lane line info to get the absolute translation?
I found this post on dsp stackexchange that partly addresses your problem. As you have found, the scale of the translation cannot be inferred from the essential matrix, you need more information. This makes sense, as there is an ambiguity of size and shape if your only information is point correspondences.
How to infer scale
If you need to know the camera translation scale, you will need to know some scene geometry. that is, something you can use as a reference to determine the extent of the translation, e.g. coordinates of a calibration object in the scene. You could then use a pose estimation method like Perspective-n-Point (PnP).
I found this lecture by Willem Hof on PnP which includes code screenshots quite clear and concise.
Note that when performing PnP that you have multiple unknown. Your first camera was assumed to be [I|0] so its pose is totally unknown. Once the first camera is known, the second camera's pose will be P1· rel P1, and you only have one unknown parameter left for the second camera, the scale of its translation.
Why you cannot infer scale of translation
For example, if you have two images of a ball and many point correspondences, taken with calibrated cameras with unknown positions and poses: then is it a normal football or a mountain-sized ball sculpture? Well, we could use the essential matrix to get relative poses of the two cameras and triangulate a 3D reconstruction of the ball. But would we know the scale? Sure we know the shape of the ball now, but what is the distance between the triangulated points? That information is not present. You can infer the camera's relative rotation; one is in front of the ball (denote this camera as [I | 0] ) and the other is on the side of the ball. You also know in which direction the camera traveled (translation), but not how far. For a large object, the translation would be of a larger scale. Again, you do know the relative translation direction and the relative rotations of two cameras from essential matrix decomposition, which is a valuable constraint.
There are a number of calibration tutorials to calibrate camera images of chessboards in EMGU (OpenCV). They all end up calibrating and then undistorting an image for display. That's cool and all but I need to do machine vision where I am taking an image, identifying the location of a corner or blob or feature in the image and then translating the location of that feature in pixels into real world X, Y coordinates.
Pixel -> mm.
Is this possible with EMGU? If so, how? I'd hate to spend a bunch of time learning EMGU and then not be able to do this crucial function.
Yes, it's certainly possible as the "bread and butter" of OpenCV.
The calibration you are describing, in terms of removing distortions, is a prerequisite to this process. After which, the following applies:
The Intrinsic calibration, or "camera matrix" is the first of two required matrices. The second is the Extrinsic calibration of the camera which is essentially the 6 DoF transform that describes the physical location of the sensor center relative to a coordinate reference frame.
All of the Distortion Coefficients, Intrinsic, and Extrinsic Calibrations are available from a single function in Emgu.CV: CvInvoke.CalibrateCamera This process is best explained, I'm sure, by one of the many tutorials available that you have described.
After that it's as simple as CvInvoke.ProjectPoints to apply the transforms above and produce 3D coordinates from 2D pixel locations.
The key to doing this successfully this providing comprehensive IInputArray objectPoints and IInputArray imagePoints to CvInvoke.CalibrateCamera. Be sure to cause "excitation" by using many images, from many different perspectives.
I have two calibrated cameras looking at an overlapping scene. I am trying to estimate the pose of camera2 with respect to camera1 (because camera2 can be moving; but both camera1 and 2 will always have some features that are overlapping).
I am identifying features using SIFT, computing the fundamental matrix and eventually the essential matrix. Once I solve for R and t (one of the four possible solutions), I obtain the translation up-to-scale, but is it possible to somehow compute the translation in real world units? There are no objects of known size in the scene; but I do have the calibration data for both the cameras. I've gone through some info about Structure from Motion and stereo pose estimation, but the concept of scale and the correlation with real world translation is confusing me.
Thanks!
This is the classical scale problem with structure from motion.
The short answer is that you must have some other source of information in order to resolve scale.
This information can be about points in the scene (e.g. terrain map), or some sensor reading from the moving camera (IMU, GPS, etc.)
I am aware of the chessboard camera calibration technique, and have implemented it.
If I have 2 cameras viewing the same scene, and I calibrate both simultaneously with the chessboard technique, can I compute the rotation matrix and translation vector between them? How?
If you have the 3D camera coordinates of the corresponding points, you can compute the optimal rotation matrix and translation vector by Rigid Body Transformation
If You are using OpenCV already then why don't you use cv::stereoCalibrate.
It returns the rotation and translation matrices. The only thing you have to do is to make sure that the calibration chessboard is seen by both of the cameras.
The exact way is shown in .cpp samples provided with OpenCV library( I have 2.2 version and samples were installed by default in /usr/local/share/opencv/samples).
The code example is called stereo_calib.cpp. Although it's not explained clearly what they are doing there (for that You might want to look to "Learning OpenCV"), it's something You can base on.
If I understood you correctly, you have two calibrated cameras observing a common scene, and you wish to recover their spatial arrangement. This is possible (provided you find enough image correspondences) but only up to an unknown factor on translation scale. That is, we can recover rotation (3 degrees of freedom, DOF) and only the direction of the translation (2 DOF). This is because we have no way to tell whether the projected scene is big and the cameras are far, or the scene is small and cameras are near. In the literature, the 5 DOF arrangement is termed relative pose or relative orientation (Google is your friend).
If your measurements are accurate and in general position, 6 point correspondences may be enough for recovering a unique solution. A relatively recent algorithm does exactly that.
Nister, D., "An efficient solution to the five-point relative pose problem," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.26, no.6, pp.756,770, June 2004
doi: 10.1109/TPAMI.2004.17
Update:
Use a structure from motion/bundle adjustment package like Bundler to solve simultaneously for the 3D location of the scene and relative camera parameters.
Any such package requires several inputs:
camera calibrations that you have.
2D pixel locations of points of interest in cameras (use a interest point detection like Harris, DoG (first part of SIFT)).
Correspondences between points of interest from each camera (use a descriptor like SIFT, SURF, SSD, etc. to do the matching).
Note that the solution is up to a certain scale ambiguity. You'll thus need to supply a distance measurement either between the cameras or between a pair of objects in the scene.
Original answer (applies primarily to uncalibrated cameras as the comments kindly point out):
This camera calibration toolbox from Caltech contains the ability to solve and visualize both the intrinsics (lens parameters, etc.) and extrinsics (how the camera positions when each photo is taken). The latter is what you're interested in.
The Hartley and Zisserman blue book is also a great reference. In particular, you may want to look at the chapter on epipolar lines and fundamental matrix which is free online at the link.