Merging depth maps for trinocular stereo - opencv

I have a parallel trinocular setup where all 3 cameras are alligned in a collinear fashion as depicted below.
Left-Camera------------Centre-Camera---------------------------------Right-Camera
The baseline (distance between cameras) between left and centre camera is the shortest and the baseline between left and right camera is the longest.
In theory I can obtain 3 sets of disparity images using different camera combinations (L-R, L-C and C-R).I can generate depth maps (3D points) for each disparity map using Triangulation. I now have 3 depth maps.
The L-C combination has higher depth accuracy (measured distance is more accurate) for objects that are near (since the baseline is short) whereas
the L-R combination has higher depth accuracy for objects that are far(since the baseline is long). Similarly the C-R combination is accurate for objects at medium distance.
In stereo setups, normally we define the left (RGB) image as the reference image. In my project, by thresholding the depth values, I obtain an ROI on the reference image. For example I find all the pixels that have a depth value between 10-20m and find their respective pixel locations. In this case, I have a relationship between 3D points and their corresponding pixel location.
Since in normal stereo setups, we can have higher depth accuracy only for one of the two regions depending upon the baseline (near and far), I plan on using 3 cameras. This helps me to generate 3D points of higher accuracy for three regions (near, medium and far).
I now want to merge the 3 depth maps to obtain a global map. My problems are as follows -
How to merge the three depth maps ?
After merging, how do I know which depth value corresponds to which pixel location in the reference (Left RGB) image ?
Your help will be much appreciated :)

1) I think that simple "merging" of depth maps (as matrices of values) is not possible, if you are thinking of a global 2D depth map as an image or a matrix of depth values. You can consider instead to merge the 3 set of 3D points with some similarity criteria like the distance (refining your point cloud). If they are too close, delete one of theme (pseudocode)
for i in range(points):
for j in range(i,points):
if distance(i,j) < treshold
delete(j)
or delete the 2 points and add a point that have average coordinates
2) From point one, this question became "how to connect a 3D point to the related pixel in the left image" (it is the only interpretation).
The answer simply is: use the projection equation. If you have K (intrinsic matrix), R (rotation matrix) and t (translation vector) from calibration of the left camera, join R and t in a 3x4 matrix
[R|t]
and then connect the M 3D point in 4-dimensional coordinates (X,Y,Z,1) as an m point (u,v,w)
m = K*[R|t]*M
divide m by its third coordinate w and you obtain
m = (u', v', 1)
u' and v' are the pixel coordinates in the left image.

Related

Measure distance to object with a single camera in a static scene

let's say I am placing a small object on a flat floor inside a room.
First step: Take a picture of the room floor from a known, static position in the world coordinate system.
Second step: Detect the bottom edge of the object in the image and map the pixel coordinate to the object position in the world coordinate system.
Third step: By using a measuring tape measure the real distance to the object.
I could move the small object, repeat this three steps for every pixel coordinate and create a lookup table (key: pixel coordinate; value: distance). This procedure is accurate enough for my use case. I know that it is problematic if there are multiple objects (an object could cover an other object).
My question: Is there an easier way to create this lookup table? Accidentally changing the camera angle by a few degrees destroys the hard work. ;)
Maybe it is possible to execute the three steps for a few specific pixel coordinates or positions in the world coordinate system and perform some "calibration" to calculate the distances with the computed parameters?
If the floor is flat, its equation is that of a plane, let
a.x + b.y + c.z = 1
in the camera coordinates (the origin is the optical center of the camera, XY forms the focal plane and Z the viewing direction).
Then a ray from the camera center to a point on the image at pixel coordinates (u, v) is given by
(u, v, f).t
where f is the focal length.
The ray hits the plane when
(a.u + b.v + c.f) t = 1,
i.e. at the point
(u, v, f) / (a.u + b.v + c.f)
Finally, the distance from the camera to the point is
p = √(u² + v² + f²) / (a.u + b.v + c.f)
This is the function that you need to tabulate. Assuming that f is known, you can determine the unknown coefficients a, b, c by taking three non-aligned points, measuring the image coordinates (u, v) and the distances, and solving a 3x3 system of linear equations.
From the last equation, you can then estimate the distance for any point of the image.
The focal distance can be measured (in pixels) by looking at a target of known size, at a known distance. By proportionality, the ratio of the distance over the size is f over the length in the image.
Most vision libraries (including opencv) have built in functions that will take a couple points from a camera reference frame and the related points from a Cartesian plane and generate your warp matrix (affine transformation) for you. (some are fancy enough to include non-linearity mappings with enough input points, but that brings you back to your time to calibrate issue)
A final note: most vision libraries use some type of grid to calibrate off of ie a checkerboard patter. If you wrote your calibration to work off of such a sheet, then you would only need to measure distances to 1 target object as the transformations would be calculated by the sheet and the target would just provide the world offsets.
I believe what you are after is called a Projective Transformation. The link below should guide you through exactly what you need.
Demonstration of calculating a projective transformation with proper math typesetting on the Math SE.
Although you can solve this by hand and write that into your code... I strongly recommend using a matrix math library or even writing your own matrix math functions prior to resorting to hand calculating the equations as you will have to solve them symbolically to turn it into code and that will be very expansive and prone to miscalculation.
Here are just a few tips that may help you with clarification (applying it to your problem):
-Your A matrix (source) is built from the 4 xy points in your camera image (pixel locations).
-Your B matrix (destination) is built from your measurements in in the real world.
-For fast recalibration, I suggest marking points on the ground to be able to quickly place the cube at the 4 locations (and subsequently get the altered pixel locations in the camera) without having to remeasure.
-You will only have to do steps 1-5 (once) during calibration, after that whenever you want to know the position of something just get the coordinates in your image and run them through step 6 and step 7.
-You will want your calibration points to be as far away from eachother as possible (within reason, as at extreme distances in a vanishing point situation, you start rapidly losing pixel density and therefore source image accuracy). Make sure that no 3 points are colinear (simply put, make your 4 points approximately square at almost the full span of your camera fov in the real world)
ps I apologize for not writing this out here, but they have fancy math editing and it looks way cleaner!
Final steps to applying this method to this situation:
In order to perform this calibration, you will have to set a global home position (likely easiest to do this arbitrarily on the floor and measure your camera position relative to that point). From this position, you will need to measure your object's distance from this position in both x and y coordinates on the floor. Although a more tightly packed calibration set will give you more error, the easiest solution for this may simply be to have a dimension-ed sheet(I am thinking piece of printer paper or a large board or something). The reason that this will be easier is that it will have built in axes (ie the two sides will be orthogonal and you will just use the four corners of the object and used canned distances in your calibration). EX: for a piece of paper your points would be (0,0), (0,8.5), (11,8.5), (11,0)
So using those points and the pixels you get will create your transform matrix, but that still just gives you a global x,y position on axes that may be hard to measure on (they may be skew depending on how you measured/ calibrated). So you will need to calculate your camera offset:
object in real world coords (from steps above): x1, y1
camera coords (Xc, Yc)
dist = sqrt( pow(x1-Xc,2) + pow(y1-Yc,2) )
If it is too cumbersome to try to measure the position of the camera from global origin by hand, you can instead measure the distance to 2 different points and feed those values into the above equation to calculate your camera offset, which you will then store and use anytime you want to get final distance.
As already mentioned in the previous answers you'll need a projective transformation or simply a homography. However, I'll consider it from a more practical view and will try to summarize it short and simple.
So, given the proper homography you can warp your picture of a plane such that it looks like you took it from above (like here). Even simpler you can transform a pixel coordinate of your image to world coordinates of the plane (the same is done during the warping for each pixel).
A homography is basically a 3x3 matrix and you transform a coordinate by multiplying it with the matrix. You may now think, wait 3x3 matrix and 2D coordinates: You'll need to use homogeneous coordinates.
However, most frameworks and libraries will do this handling for you. What you need to do is finding (at least) four points (x/y-coordinates) on your world plane/floor (preferably the corners of a rectangle, aligned with your desired world coordinate system), take a picture of them, measure the pixel coordinates and pass both to the "find-homography-function" of your desired computer vision or math library.
In OpenCV that would be findHomography, here an example (the method perspectiveTransform then performs the actual transformation).
In Matlab you can use something from here. Make sure you are using a projective transformation as transform type. The result is a projective tform, which can be used in combination with this method, in order to transform your points from one coordinate system to another.
In order to transform into the other direction you just have to invert your homography and use the result instead.

How do I reconstruct 3D points from rectified images at different horizontal and vertical positions?

I am familiar with reconstruction of 3D points from stereo rectified pairs. The equations for calculating coordinate estimates are:
Z = fB/D
X = uZ/f
Y = vZ/f
Where f = focal length, B = baseline, D = disparity, (u,v) are the 2D projected image coordinates.
Say I now have four cameras in a 2x2 grid. I have identified and matched fiducial markers in each image. I now want to estimate 3D point position from these projected points.
My question has two parts:
1) How does the triangulation equation change when images are not on the same horizontal baseline?
2) How do I derive an estimate from multiple pair-wise estimates?
What you are looking for is triangulation. A good starting point is to read the paper by Hartley and Sturm. There is a nice implementation in MATLAB's image processing toolbox, a number of googlable others out there, and, finally, it's not hard to write one's own on the basis of the abovementioned paper.

Triangulation to find distance to the object- Image to world coordinates

Localization of an object specified in the image.
I am working on the project of computer vision to find the distance of an object using stereo images.I followed the following steps using OpenCV to achieve my objective
1. Calibration of camera
2. Surf matching to find fundamental matrix
3. Rotation and Translation vector using svd as method is described in Zisserman and Hartley book.
4. StereoRectify to get the projection matrix P1, P2 and Rotation matrices R1, R2. The Rotation matrices can also be find using Homography R=CameraMatrix.inv() H Camera Matrix.
Problems:
i triangulated point using least square triangulation method to find the real distance to the object. it returns value in the form of [ 0.79856 , .354541 .258] . How will i map it to real world coordinates to find the distance to an object.
http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/
Alternative approach:
Find the disparity between the object in two images and find the depth using the given formula
Depth= ( focal length * baseline ) / disparity
for disparity we have to perform the rectification first and the points must be undistorted. My rectification images are black.
Please help me out.It is important
Here is the detail explanation of how i implemented the code.
Calibration of Camera using Circles grid to get the camera matrix and Distortion coefficient. The code is given on the Github (Andriod).
2.Take two pictures of a car. First from Left and other from Right. Take the sub-image and calculate the -fundmental matrix- essential matrix- Rotation matrix- Translation Matrix....
3.I have tried to projection in two ways.
Take the first image projection as identity matrix and make a second project 3x4d through rotation and translation matrix and perform Triangulation.
Get the Projection matrix P1 and P2 from Stereo Rectify to perform Triangulation.
My object is 65 meters away from the camera and i dont know how to calculate this true this based on the result of triangulation in the form of [ 0.79856 , .354541 .258]
Question: Do i have to do some extra calibration to get the result. My code is not based to know the detail of geometric size of the object.
So you already computed the triangulation? Well, then you have points in camera coordinates, i.e. in the coordinate frame centered on one of the cameras (the left or right one depending on how your code is written and the order in which you feed your images to it).
What more do you want? The vector length (square root of the sum of the square coordinates) of those points is their estimated distance from the same camera. If you want their position in some other "world" coordinate system, you need to give the coordinate transform between that system and the camera - presumably through a calibration procedure.

Recover plane from homography

I have used openCV to calculate the homography relating to views of the same plane by using features and matching them. Is there any way to recover the plane itsself or the plane normal from this homography? (I am looking for an equation where H is the input and the normal n is the output.)
If you have the calibration of the cameras, you can extract the normal of the plane, but not the distance to the plane (i.e. the transformation that you obtain is up to scale), as Wikipedia explains. I don't know any implementation to do it, but here you are a couple of papers that deal with that problem (I warn you it is not straightforward): Faugeras & Lustman 1988, Vargas & Malis 2005.
You can recover the real translation of the transformation (i.e. the distance to the plane) if you have at least a real distance between two points on the plane. If that is the case, the easiest way to go with OpenCV is to first calculate the homography, then obtain four points on the plane with their 2D coordinates and the real 3D ones (you should be able to obtain them if you have a real measurement on the plane), and using PnP finally. PnP will give you a real transformation.
Rectifying an image is defined as making epipolar lines horizontal and lying in the same row in both images. From your description I get that you simply want to warp the plane such that it is parallel to the camera sensor or the image plane. This has nothing to do with rectification - I’d rather call it an obtaining a bird’s-eye view or a top view.
I see the source of confusion though. Rectification of images usually involves multiplication of each image with a homography matrix. In your case though each point in sensor plane b:
Xb = Hab * Xa = (Hb * Ha^-1) * Xa, where Ha is homography from the plane in the world to the sensor a; Ha and intrinsic camera matrix will give you a plane orientation but I don’t see an easy way to decompose Hab into Ha and Hb.
A classic (and hard) way is to find a Fundamental matrix, restore the Essential matrix from it, decompose the Essential matrix into camera rotation and translation (up to scale), rectify both images, perform a dense stereo, then fit a plane equation into 3d points you reconstruct.
If you interested in the ground plane and you operate an embedded device though, you don’t even need two frames - a top view can be easily recovered from a single photo, camera elevation from the ground (H) and a gyroscope (or orientation vector) readings. A simple diagram below explains the process in 2D case: first picture shows how to restore Z (depth) coordinate to every point on the ground plane; the second picture shows a plot of the top view with vertical axis being z and horizontal axis x = (img.col-w/2)*Z/focal; Here is img.col is image column, w - image width, and focal is camera focal length. Note that a camera frustum looks like a trapezoid in a birds eye view.

Project 2d points in camera 1 image to camera 2 image after a stereo calibration

I am doing stereo calibration of two cameras (let's name them L and R) with opencv. I use 20 pairs of checkerboard images and compute the transformation of R with respect to L. What I want to do is use a new pair of images, compute the 2d checkerboard corners in image L, transform those points according to my calibration and draw the corresponding transformed points on image R with the hope that they will match the corners of the checkerboard in that image.
I tried the naive way of transforming the 2d points from [x,y] to [x,y,1], multiply by the 3x3 rotation matrix, add the rotation vector and then divide by z, but the result is wrong, so I guess it's not that simple (?)
Edit (to clarify some things):
The reason I want to do this is basically because I want to validate the stereo calibration on a new pair of images. So, I don't actually want to get a new 2d transformation between the two images, I want to check if the 3d transformation I have found is correct.
This is my setup:
I have the rotation and translation relating the two cameras (E), but I don't have rotations and translations of the object in relation to each camera (E_R, E_L).
Ideally what I would like to do:
Choose the 2d corners in image from camera L (in pixels e.g. [100,200] etc).
Do some kind of transformation on the 2d points based on matrix E that I have found.
Get the corresponding 2d points in image from camera R, draw them, and hopefully they match the actual corners!
The more I think about it though, the more I am convinced that this is wrong/can't be done.
What I am probably trying now:
Using the intrinsic parameters of the cameras (let's say I_R and I_L), solve 2 least squares systems to find E_R and E_L
Choose 2d corners in image from camera L.
Project those corners to their corresponding 3d points (3d_points_L).
Do: 3d_points_R = (E_L).inverse * E * E_R * 3d_points_L
Get the 2d_points_R from 3d_points_R and draw them.
I will update when I have something new
It is actually easy to do that but what you're making several mistakes. Remember after stereo calibration R and L relate the position and orientation of the second camera to the first camera in the first camera's 3D coordinate system. And also remember to find the 3D position of a point by a pair of cameras you need to triangulate the position. By setting the z component to 1 you're making two mistakes. First, most likely you have used the common OpenCV stereo calibration code and have given the distance between the corners of the checker board in cm. Hence, z=1 means 1 cm away from the center of camera, that's super close to the camera. Second, by setting the same z for all the points you are saying the checker board is perpendicular to the principal axis (aka optical axis, or principal ray), while most likely in your image that's not the case. So you're transforming some virtual 3D points first to the second camera's coordinate system and then projecting them onto the image plane.
If you want to transform just planar points then you can find the homography between the two cameras (OpenCV has the function) and use that.

Resources