How do I project an image by a disparity map? - opencv

I have a stereo pair, and a map of vectors that represent the pixel-pixel disparity between my left image to my right image. I would like to project my left image into my right image, using the disparity map.
I am stuck on how to achieve this with some accuracy, given that the disparity map vectors are floating point, not clean integer values that map directly to the pixels in my right image.

First question - are your images rectified? (See:https://en.wikipedia.org/wiki/Image_rectification) If yes, you can generate the "right image" from the given left image and the disparity map, changing each pixel's column (or x) coordinate by the disparity amount. There will be some blank pixels due to occlusions, however.
Sub-pixel accuracy images, however, cannot be generated in this way, as you noted. One thing you can do is round the disparities to integer values and create the image. Another thing you can do is create an image that is 2x or 5x or 10x (or however many times) larger than your input image, and then use this additional resolution to get "sub-pixel" accuracy for your projection image. But you will get some holes this way, and would likely need to interpolate to generate a piece-wise smooth final result.

Related

Disparity Map vs Left Image(reference image)

I'm really having a hard time matching every pixel of the image with its corresponding disparity value, as the disparity map is way more shifted to the right than the original image. And it looks more stretched than the original.
As you can see, the object in the original image is way thinner than its disparity. How can I fix this?
Since I cannot figure out how to retrieve the original (x,y) of points from the original image using when using reprojectImageTo3D

Determining pixel coordinates across display resolutions

If a program displays a pixel at X,Y on a display with resolution A, can I precisely predict at what coordinates the same pixel will display at resolution B?
MORE INFORMATION
The 2 display resolutions are:
A-->1366 x 768
B-->1600 x 900
Dividing the max resolutions in each direction yields:
X-direction scaling factor = 1600/1366 = 1.171303075
Y-direction scaling factor = 900/768 = 1.171875
Say for example that the only red pixel on display A occurs at pixel (1,1). If I merely scale up using these factors, then on display B, that red pixel will be displayed at pixel (1.171303075, 1.171875). I'm not sure how to interpret that, as I'm used to thinking of pixels as integer values. It might help if I knew the exact geometry of pixel coordinates/placement on a screen. e.g., do pixel coordinates (1,1) mean that the center of the pixel is at (1,1)? Or a particular corner of the pixel is at (1,1)? I'm sure diagrams would assist in visualizing this--if anyone can post a link to helpful resources, I'd appreciate it. And finally, I may be approaching this all wrong.
Thanks in advance.
I think, your problem is related to the field of scaling/resampling images. Bitmap-, or raster images are digital photographs, so they are the most common form to represent natural images that are rich in detail. The term bitmap refers to how a given pattern (bits in a pixel) maps to a specific color. A bitmap images take the form of an array, where the value of each element, called a pixel picture element, correspond to the color of that region of the image.
Sampling
When measuring the value for a pixel, one takes the average color of an area around the location of the pixel. A simplistic model is sampling a square, and a more accurate measurement is to calculate a weighted Gaussian average. When perceiving a bitmap image the human eye should blend the pixel values together, recreating an illusion of the continuous image it represents.
Raster dimensions
The number of horizontal and vertical samples in the pixel grid is called raster dimensions, it is specified as width x height.
Resolution
Resolution is a measurement of sampling density, resolution of bitmap images give a relationship between pixel dimensions and physical dimensions. The most often used measurement is ppi, pixels per inch.
Scaling / Resampling
Image scaling is the name of the process when we need to create an image with different dimensions from what we have. A different name for scaling is resampling. When resampling algorithms try to reconstruct the original continuous image and create a new sample grid. There are two kind of scaling: up and down.
Scaling image down
The process of reducing the raster dimensions is called decimation, this can be done by averaging the values of source pixels contributing to each output pixel.
Scaling image up
When we increase the image size we actually want to create sample points between the original sample points in the original raster, this is done by interpolation the values in the sample grid, effectively guessing the values of the unknown pixels. This interpolation can be done by nearest-neighbor interpolation, bilinear interpolation, bicubic interpolation, etc. But the scaled up/down image must be also represented over discrete grid.

Understanding Disparity Map in Opencv

Can somebody explain me what exactly does a disparity map return. Because there is not much given in the documentation and I have a few questions related to it.
Does it return difference values of pixels with respect to both images?
How to use disparity values in the formula for depth estimation i.e.
Depth = focalLength*Baseline/Disparity
I have read somewhere that disparity map gives a function of depth f(z)
Please explain what it means. If depth is purely an absolute value how can it be generated as a function or is it a function with respect to the pixels?
The difference d = pl − pr of two corresponding image points is called disparity.
Here, pl is the position of the point in the left stereo image and pr is the position of the point in the right stereo image.
For parallel optical axes, the disparity is d = xl − xr
⇒ search for depth information is equivalent to search for disparity, i.e. corresponding pixel
the distance is inversely proportional to disparity
The disparity values are visualized in a so-called disparity map, each disparity value for each pixel in the reference image (here: left) is coded as a grayscale value. Also for pixel that do not have any correspondences, a grayscale value (here: black) is defined. The so-called groundtruth-map is a disparity map that contains the ideal solution of the correspondence problem.
Relation between Disparity and Depth information:
The following image represent two cameras (left and right) and then tries to find the depth of a point p(x_w, z_x).
The result of depth is given my:
so, it can be seen that the depth is inversely proportional to disparity.
UPDATE:
To calculate the disparity, you need two image (1) Left image and (2) Right image. Lets say that there is a pixel at position (60,30) in left image and that same pixel is present at position (40,30) in right image then your disparity will be: 60 - 40 = 20. So, disparity map gives you the difference between the position of pixels between left image and right image. If a pixel is present in left image but absent in right image then then value at that position in disparity map will be zero. Once you get the disparity value for each pixel of left image then we can easily calculate the depth using the formula given at the end of my answer.

Documentation of CvStereoBMState for disparity calculation with cv::StereoBM

The application of Konolige's block matching algorithm is not sufficiantly explained in the OpenCV documentation. The parameters of CvStereoBMState influence the accuracy of the disparities calculated by cv::StereoBM. However, those parameters are not documented. I will list those parameters below and describe, what I understand. Maybe someone can add a description of the parameters, which are unclear.
preFilterType: Determines, which filter is applied on the image before the disparities are calculated. Can be CV_STEREO_BM_XSOBEL (Sobel filter) or CV_STEREO_BM_NORMALIZED_RESPONSE (maybe differences to mean intensity???)
preFilterSize: Window size of the prefilter (width = height of the window, negative value)
preFilterCap: Clips the output to [-preFilterCap, preFilterCap]. What happens to the values outside the interval?
SADWindowSize: Size of the compared windows in the left and in the right image, where the sums of absolute differences are calculated to find corresponding pixels.
minDisparity: The smallest disparity, which is taken into account. Default is zero, should be set to a negative value, if negative disparities are possible (depends on the angle between the cameras views and the distance of the measured object to the cameras).
numberOfDisparities: The disparity search range [minDisparity, minDisparity+numberOfDisparities].
textureThreshold: Calculate the disparity only at locations, where the texture is larger than (or at least equal to?) this threshold. How is texture defined??? Variance in the surrounding window???
uniquenessRatio: Cited from calib3d.hpp: "accept the computed disparity d* only ifSAD(d) >= SAD(d*)(1 + uniquenessRatio/100.) for any d != d+/-1 within the search range."
speckleRange: Unsure.
trySmallerWindows: ???
roi1, roi2: Calculate the disparities only in these regions??? Unsure.
speckleWindowSize: Unsure.
disp12MaxDiff: Unsure, but a comment in calib3d.hpp says, that a left-right check is performed. Guess: Pixels are matched from the left image to the right image and from the right image back to the left image. The disparities are only valid, if the distance between the original left pixel and the back-matched pixel is smaller than disp12MaxDiff.
speckleWindowSize and speckleRange are parameters for the function cv::filterSpeckles. Take a look at OpenCV's documentation.
cv::filterSpeckles is used to post-process the disparity map. It replaces blobs of similar disparities (the difference of two adjacent values does not exceed speckleRange) whose size is less or equal speckleWindowSize (the number of pixels forming the blob) by the invalid disparity value (either short -16 or float -1.f).
The parameters are better described in the Python tutorial on depth map from stereo images. The parameters seem to be the same.
texture_threshold: filters out areas that don't have enough texture
for reliable matching
Speckle range and size: Block-based matchers
often produce "speckles" near the boundaries of objects, where the
matching window catches the foreground on one side and the background
on the other. In this scene it appears that the matcher is also
finding small spurious matches in the projected texture on the table.
To get rid of these artifacts we post-process the disparity image with
a speckle filter controlled by the speckle_size and speckle_range
parameters. speckle_size is the number of pixels below which a
disparity blob is dismissed as "speckle." speckle_range controls how
close in value disparities must be to be considered part of the same
blob.
Number of disparities: How many pixels to slide the window over.
The larger it is, the larger the range of visible depths, but more
computation is required.
min_disparity: the offset from the x-position
of the left pixel at which to begin searching.
uniqueness_ratio:
Another post-filtering step. If the best matching disparity is not
sufficiently better than every other disparity in the search range,
the pixel is filtered out. You can try tweaking this if
texture_threshold and the speckle filtering are still letting through
spurious matches.
prefilter_size and prefilter_cap: The pre-filtering
phase, which normalizes image brightness and enhances texture in
preparation for block matching. Normally you should not need to adjust
these.
Also check out this ROS tutorial on choosing stereo parameters.

Deforming an image so that curved lines become straight lines

I have an image with free-form curved lines (actually lists of small line-segments) overlayed onto it, and I want to generate some kind of image-warp that will deform the image in such a way that these curves are deformed into horizontal straight lines.
I already have the coordinates of all the line-segment points stored separately so they don't have to be extracted from the image. What I'm looking for is an appropriate method of warping the image such that these lines are warped into straight ones.
thanks
You can use methods similar to those developed here:
http://www-ui.is.s.u-tokyo.ac.jp/~takeo/research/rigid/
What you do, is you define an MxN grid of control points which covers your source image.
You then need to determine how to modify each of your control points so that the final image will minimize some energy function (minimum curvature or something of this sort).
The final image is a linear warp determined by your control points (think of it as a 2D mesh whose texture is your source image and whose vertices' positions you're about to modify).
As long as your energy function can be expressed using linear equations, you can globally solve your problem (figuring out where to send each control point) using linear equations solver.
You express each of your source points (those which lie on your curved lines) using bi-linear interpolation weights of their surrounding grid points, then you express your restriction on the target by writing equations for these points.
After solving these linear equations you end up with destination grid points, then you just render your 2D mesh with the new vertices' positions.
You need to start out with a mapping formula that given an output coordinate will provide the corresponding coordinate from the input image. Depending on the distortion you're trying to correct for, this can get exceedingly complex; your question doesn't specify the problem in enough detail. For example, are the curves at the top of the image the same as the curves on the bottom and the same as those in the middle? Do horizontal distances compress based on the angle of the line? Let's assume the simplest case where the horizontal coordinate doesn't need any correction at all, and the vertical simply needs a constant correction based on the horizontal. Here x,y are the coordinates on the input image, x',y' are the coordinates on the output image, and f() is the difference between the drawn line segment and your ideal straight line.
x = x'
y = y' + f(x')
Now you simply go through all the pixels of your output image, calculate the corresponding point in the input image, and copy the pixel. The wrinkle here is that your formula is likely to give you points that lie between input pixels, such as y=4.37. In that case you'll need to interpolate to get an intermediate value from the input; there are many interpolation methods for images and I won't try to get into that here. The simplest would be "nearest neighbor", where you simply round the coordinate to the nearest integer.

Resources