Can somebody explain me what exactly does a disparity map return. Because there is not much given in the documentation and I have a few questions related to it.
Does it return difference values of pixels with respect to both images?
How to use disparity values in the formula for depth estimation i.e.
Depth = focalLength*Baseline/Disparity
I have read somewhere that disparity map gives a function of depth f(z)
Please explain what it means. If depth is purely an absolute value how can it be generated as a function or is it a function with respect to the pixels?
The difference d = pl − pr of two corresponding image points is called disparity.
Here, pl is the position of the point in the left stereo image and pr is the position of the point in the right stereo image.
For parallel optical axes, the disparity is d = xl − xr
⇒ search for depth information is equivalent to search for disparity, i.e. corresponding pixel
the distance is inversely proportional to disparity
The disparity values are visualized in a so-called disparity map, each disparity value for each pixel in the reference image (here: left) is coded as a grayscale value. Also for pixel that do not have any correspondences, a grayscale value (here: black) is defined. The so-called groundtruth-map is a disparity map that contains the ideal solution of the correspondence problem.
Relation between Disparity and Depth information:
The following image represent two cameras (left and right) and then tries to find the depth of a point p(x_w, z_x).
The result of depth is given my:
so, it can be seen that the depth is inversely proportional to disparity.
UPDATE:
To calculate the disparity, you need two image (1) Left image and (2) Right image. Lets say that there is a pixel at position (60,30) in left image and that same pixel is present at position (40,30) in right image then your disparity will be: 60 - 40 = 20. So, disparity map gives you the difference between the position of pixels between left image and right image. If a pixel is present in left image but absent in right image then then value at that position in disparity map will be zero. Once you get the disparity value for each pixel of left image then we can easily calculate the depth using the formula given at the end of my answer.
Related
As a result of the faster r-cnn method of object detection, I have obtained a set of boxes of intensity values(each bounding box can be thought of as a 3D matrix with depth of 3 for rgb intensity, a width and a height which can then be converted into a 2D matrix by taking gray scale) corresponding to the region containing the object. What I want to do is to obtain the corresponding co-ordinate points in the original image for each cell of intensity inside of the bounding box. Any ideas how to do so?
From what I understand, you got an R-CNN model that outputs cropped pieces of the input image and you now want to trace those output crops back to their coordinates in the original image.
What you can do is simply use a patch-similarity-measure to find the original position.
Since the output crop should look exactly like itself in the original image, just use Pixel-based distance:
Find the place in the image with the smallest distance (should be zero) and from that you can find your desired coordinates.
In python:
d_min = 10**6
crop_size = crop.shape
for x in range(org_image.shape[0]-crop_size[0]):
for y in range(org_image.shape[1]-crop_size[1]):
d = np.abs(np.sum(np.sum(org_image[x:x+crop_size[0],y:y+crop_size[0]]-crop)))
if d <= d_min:
d_min = d
coord = [x,y]
However, your model should have that info available in it (after all, it crops the output based on some coordinates). Maybe if you add some info on your implementation.
I have a stereo pair, and a map of vectors that represent the pixel-pixel disparity between my left image to my right image. I would like to project my left image into my right image, using the disparity map.
I am stuck on how to achieve this with some accuracy, given that the disparity map vectors are floating point, not clean integer values that map directly to the pixels in my right image.
First question - are your images rectified? (See:https://en.wikipedia.org/wiki/Image_rectification) If yes, you can generate the "right image" from the given left image and the disparity map, changing each pixel's column (or x) coordinate by the disparity amount. There will be some blank pixels due to occlusions, however.
Sub-pixel accuracy images, however, cannot be generated in this way, as you noted. One thing you can do is round the disparities to integer values and create the image. Another thing you can do is create an image that is 2x or 5x or 10x (or however many times) larger than your input image, and then use this additional resolution to get "sub-pixel" accuracy for your projection image. But you will get some holes this way, and would likely need to interpolate to generate a piece-wise smooth final result.
I have a parallel trinocular setup where all 3 cameras are alligned in a collinear fashion as depicted below.
Left-Camera------------Centre-Camera---------------------------------Right-Camera
The baseline (distance between cameras) between left and centre camera is the shortest and the baseline between left and right camera is the longest.
In theory I can obtain 3 sets of disparity images using different camera combinations (L-R, L-C and C-R).I can generate depth maps (3D points) for each disparity map using Triangulation. I now have 3 depth maps.
The L-C combination has higher depth accuracy (measured distance is more accurate) for objects that are near (since the baseline is short) whereas
the L-R combination has higher depth accuracy for objects that are far(since the baseline is long). Similarly the C-R combination is accurate for objects at medium distance.
In stereo setups, normally we define the left (RGB) image as the reference image. In my project, by thresholding the depth values, I obtain an ROI on the reference image. For example I find all the pixels that have a depth value between 10-20m and find their respective pixel locations. In this case, I have a relationship between 3D points and their corresponding pixel location.
Since in normal stereo setups, we can have higher depth accuracy only for one of the two regions depending upon the baseline (near and far), I plan on using 3 cameras. This helps me to generate 3D points of higher accuracy for three regions (near, medium and far).
I now want to merge the 3 depth maps to obtain a global map. My problems are as follows -
How to merge the three depth maps ?
After merging, how do I know which depth value corresponds to which pixel location in the reference (Left RGB) image ?
Your help will be much appreciated :)
1) I think that simple "merging" of depth maps (as matrices of values) is not possible, if you are thinking of a global 2D depth map as an image or a matrix of depth values. You can consider instead to merge the 3 set of 3D points with some similarity criteria like the distance (refining your point cloud). If they are too close, delete one of theme (pseudocode)
for i in range(points):
for j in range(i,points):
if distance(i,j) < treshold
delete(j)
or delete the 2 points and add a point that have average coordinates
2) From point one, this question became "how to connect a 3D point to the related pixel in the left image" (it is the only interpretation).
The answer simply is: use the projection equation. If you have K (intrinsic matrix), R (rotation matrix) and t (translation vector) from calibration of the left camera, join R and t in a 3x4 matrix
[R|t]
and then connect the M 3D point in 4-dimensional coordinates (X,Y,Z,1) as an m point (u,v,w)
m = K*[R|t]*M
divide m by its third coordinate w and you obtain
m = (u', v', 1)
u' and v' are the pixel coordinates in the left image.
I am not able to under stand the formula ,
What is W (window) and intensity in the formula mean,
I found this formula in opencv doc
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
For a grayscale image, intensity levels (0-255) tells you how bright is the pixel..hope that you already know about it.
So, now the explanation of your formula is below:
Aim: We want to find those points which have maximum variation in terms of intensity level in all direction i.e. the points which are very unique in a given image.
I(x,y): This is the intensity value of the current pixel which you are processing at the moment.
I(x+u,y+v): This is the intensity of another pixel which lies at a distance of (u,v) from the current pixel (mentioned above) which is located at (x,y) with intensity I(x,y).
I(x+u,y+v) - I(x,y): This equation gives you the difference between the intensity levels of two pixels.
W(u,v): You don't compare the current pixel with any other pixel located at any random position. You prefer to compare the current pixel with its neighbors so you chose some value for "u" and "v" as you do in case of applying Gaussian mask/mean filter etc. So, basically w(u,v) represents the window in which you would like to compare the intensity of current pixel with its neighbors.
This link explains all your doubts.
For visualizing the algorithm, consider the window function as a BoxFilter, Ix as a Sobel derivative along x-axis and Iy as a Sobel derivative along y-axis.
http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/sobel_derivatives/sobel_derivatives.html will be useful to understand the final equations in the above pdf.
I have successfuly created a satisfying disparity map using cv::stereoSGBM class in OpenCV 2.4.5, with 256 different disparity levels. I have divided all the disparity values by 16, as indicated in the documentation, but this final map can't possibly contain the "true" disparities: if I add to a (x, y) pixel's horizontal dimension its computed disparity (x + d), the resulting coordinates (x + d, y) in the other image correspond to a completely different pixel.
I guess the problem stems from the fact that I parameterized stereoSGBM initially for 256 disparity levels, but the actual maximum disparity (in pixels) is way smaller. Since I don't really know the actual maximum disparity, I can't just normalize accordingly the values computed by stereoSGBM.
I need the actual disparity values to compute dense stereo correspondences (necessary for triangulation and 3D reconstruction).
I had the same problem and this is how I fixed it:
Your actual (raw) disparity is what the StereoSGBM class gives as output. You scale it only so that you can see it in an image format.
In order to obtain corresponding pixels of the left and the right images, use the disparity matrix given by the StereoSGBM class as is. That means, you should not be scaling it to view in the image format. Then you'll get the corresponding points properly.
But be careful when you're accessing these values. The data type is _int16. You'll run into exceptions if you use the wrong data type.
This is what you'll do for corresponding points:
Suppose I1 is your left image and I2 is your right image, and d is the disparity value at some row i and column j, then, I1(i,j) and I2(i,j-d) will be the corresponding points, assuming horizontal stereo.
I did some tests and the true disparity is given to you when you divide the output of SGBM by 16. Here you see two histograms of the disparity output for -128 to 128 and -128 to 256 disparity: (the displayed disparity is divided by 16)
You see that the middle area between -50 and 128 is equivalent. So the values for num_disparity and min_disparity doesn't affect the scaled output of SGBM.
Note that the disparity is scaled by the baseline and focal length of the used cameras. So if you use several disparity maps from multiple stereo-pair images, every single disparity map is scaled by the baseline of this pair and if different cameras were used by the specific focal lengths.