3D stereo, bad 3D coordinates

3D stereo, bad 3D coordinates - opencv

I'm using stereo vision to obtain 3D reconstruction. I'm using opencv library.
I've implemented my code this way:
1) Stereo Calibration
2) undistort and Rectification of image pair
3) disparity map - using SGBM
4) 3D coordinates calculating depht map - unsing reprojectImageTo3D();
Results:
-Good disparity map, and good 3D reconstruction
-Bad 3D coordinates values, the distances don't corresponde to the reality.
The 3D distances, the distante between camera and object, have 10 mm error and increse with distance. I,ve used various baselines and i get always error.
When i compare the extrinsic parameter, vector T, output of "stereoRectify" the baseline match.
So i dont know where the problem is.
Can someone help me please, thanks in advance
CAlibration:
http://textuploader.com/ocxl
http://textuploader.com/ocxm

Ten mm error can be reasonable for stereo vision solutions, all depending of course on the sensor sensitivity, resolution, baseline and the distance to the object.
The increasing error with respect to the object's distance is also typical to the problem - the stereo correspondence essentially performs triangulation between the two video sensors to the object, and the larger the distance is the derivative of the angle between the video sensors to the object translates to larger distance on the depth axis, which means larger error. Good example is when the angle between the video sensors to the object is almost right, which means that any small positive error in estimating it will throw the estimated depth to infinity.
The architecture you selected looks good. You can try increasing the sensors resolution, or maybe dig in to the calibration process which has a lot of room for tuning in the openCV library - making sure only images taken with the chessboard being static are selected, choose higher variety of the different poses of the chessboard, adding images until the registration between the two images drops below the maximal error you can allow, etc.

Related

Difference between stereo camera calibration vs two single camera calibrations using OpenCV

I have a vehicle with two cameras, left and right. Is there a difference between me calibrating each camera separately vs me performing "stereo calibration" ? I am asking because I noticed in the OpenCV documentation that there is a stereoCalibrate function, and also a stereo calibration tool for MATLAB. If I do separate camera calibration on each and then perform a depth calculation using the undistorted images of each camera, will the results be the same ?
I am not sure what the difference is between the two methods. I performed normal camera calibration for each camera separately.

For intrinsics, it doesn't matter. The added information ("pair of cameras") might make the calibration a little better though.
Stereo calibration gives you the extrinsics, i.e. transformation matrices between cameras. That's for... stereo vision. If you don't perform stereo calibration, you would lack the extrinsics, and then you can't do any depth estimation at all, because that requires the extrinsics.

TL;DR
You need stereo calibration if you want 3D points.
Long answer
There is a huge difference between single and stereo camera calibration.
The output of single camera calibration are intrinsic parameters only (i.e. the 3x3 camera matrix and a number of distortion coefficients, depending on the model used). In OpenCV this is accomplished by cv2.calibrateCamera. You may check my custom library that helps reducing the boilerplate.
When you do stereo calibration, its output is given by the intrinsics of both cameras and the extrinsic parameters.
In OpenCV this is done with cv2.stereoCalibrate. OpenCV fixes the world origin in the first camera and then you get a rotation matrix R and translation vector t to go from the first camera (origin) to the second one.
So, why do we need extrinsics? If you are using a stereo system for 3D scanning then you need those (and the intrinsics) to do triangulation, so to obtain 3D points in the space: if you know the projection of a general point p in the space on both cameras, then you can calculate its position.
To add something to what #Christoph correctly answered before, the intrinsics should be almost the same, however, cv2.stereoCalibrate may improve the calculation of the intrinsics if the flag CALIB_FIX_INTRINSIC is not set. This happens because the system composed by two cameras and the calibration board is solved as a whole by numerical optimization.

Use EMGU to get "real world" coordinates of pixel values

There are a number of calibration tutorials to calibrate camera images of chessboards in EMGU (OpenCV). They all end up calibrating and then undistorting an image for display. That's cool and all but I need to do machine vision where I am taking an image, identifying the location of a corner or blob or feature in the image and then translating the location of that feature in pixels into real world X, Y coordinates.
Pixel -> mm.
Is this possible with EMGU? If so, how? I'd hate to spend a bunch of time learning EMGU and then not be able to do this crucial function.

Yes, it's certainly possible as the "bread and butter" of OpenCV.
The calibration you are describing, in terms of removing distortions, is a prerequisite to this process. After which, the following applies:
The Intrinsic calibration, or "camera matrix" is the first of two required matrices. The second is the Extrinsic calibration of the camera which is essentially the 6 DoF transform that describes the physical location of the sensor center relative to a coordinate reference frame.
All of the Distortion Coefficients, Intrinsic, and Extrinsic Calibrations are available from a single function in Emgu.CV: CvInvoke.CalibrateCamera This process is best explained, I'm sure, by one of the many tutorials available that you have described.
After that it's as simple as CvInvoke.ProjectPoints to apply the transforms above and produce 3D coordinates from 2D pixel locations.
The key to doing this successfully this providing comprehensive IInputArray objectPoints and IInputArray imagePoints to CvInvoke.CalibrateCamera. Be sure to cause "excitation" by using many images, from many different perspectives.

3D reconstruction using stereo camera

I try to construct 3D point cloud and measure real sizes or distances of objects using stereo camera. The cameras are stereo calibrated, and I find 3D points using reprojection matrix Q and disparity.
My problem is the calculated sizes are changing depending the distance from cameras. I calculate the distances between two 3D points, it has to be constant, but when object gets closer to the camera, distance increasing.
Am i missing something? The 3D coordinates have to be in camera coordinates, not in pixel coordinates. So it seems inaccurate to me. Any idea?

You didn't mention how far apart your cameras are - the baseline. If they are very close together compared with the distance of the point that you are measuring, a slight inaccuracy in your measurement can lead to a big difference in the computed distance.
One way you can check if this is the problem is by testing with only lateral movement of the camera.

Triangulation of Rectified Image Points in Multiple Views

I am working with a set of calibrated images that form a ring around a foreground object (1). I used Fusiello's method (1) to rectify adjacent pairs of images, and then I performed disparity estimation.
When I take the matched points from a stereo pair and triangulate them, it forms an accurate point cloud. Unfortunately, when I triangulate the points from another stereo image pair, this point cloud never aligns correctly with the original cloud.
Should calibrated, rectified images' point clouds merge together automatically?
Thanks in advance for any help you can offer.

This might be due to the accuracy of calibration - both intrinsic (i.e. the same camera model - and how it handles distortion) and extrinsic (i.e. the camera pose in real space). Together, of course, these dictate the ultimate accuracy of your re-projection.
Do you have a measure of error for camera calibration - in terms of MSE re-projection?
Cumulative error is often noticeable in my experience if simply iterating over subsequent images. Some form of global optimisation often needs to be performed to first correct positions for all the camera poses.
The accuracy of your disparity estimation is also a factor. Not only in terms of the algorithm you using, but also in relation to the stereo baseline and how it relates to the size/nature of the object in question (how concave/convex), and how many sampling of the images you are taking (and the quality of those images - exposure/depth-of-field/etc).
Fundamentally, just how "off" are your point clouds? Are they close to being aligned (you could do a bit of ICP before triangulation...). Are they closer in the "centre" of the re-projection? Are they worse for projections taken from opposing images on opposite sides of the object?
Remember as well that (due to the discrete sampling) you shouldn't expect points to ever be re-projected exactly "on-top" on one another. Some form of binning operation during the triangulation pipeline usually occurs for handling this (hence most of the research work in visual hull -> voxels -> marching cubes -> triangulated surface around this...)
Have you checked out MeshLab BTW?

Volume of the camera calibration

I am dealing with the problem, which concerns the camera calibration. I need calibrated cameras to realize measurements of the 3D objects. I am using OpenCV to carry out the calibration and I am wondering how can I predict or calculate a volume in which the camera is well calibrated. Is there a solution to increase the volume espacially in the direction of the optical axis? Does the procedure, in which I increase the movement range of the calibration target in 'z' direction gives sufficient difference?

I think you confuse a few key things in your question:
Camera calibration - this means finding out the matrices (intrinsic and extrinsic) that describe the camera position, rotation, up vector, distortion, optical center etc. etc.
Epipolar Rectification - this means virtually "rotating" the image planes so that they become coplanar (parallel). This simplifies the stereo reconstruction algorithms.
For camera calibration you do not need to care about any volumes - there aren't volumes where the camera is well calibrated or wrong calibrated. If you use the chessboard pattern calibration, your cameras are either calibrated or not.
When dealing with rectification, you want to know which areas of the rectified images correspond and also maximize these areas. OpenCV allows you to choose between two extremes - either making all pixels in the returned areas valid and cutting out pixels that don't fit into the rectangular area or include all pixels even with invalid ones.
OpenCV documentation has some nice, more detailed descriptions here: http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart