How to check recovered depth - image-processing

I am using disparity map generation from 2 stereoscopic images .And then I use the normal triangulation formula depth= focallength*baseline/disparity to get the depth .How can I check that the recovered depth is indeed correct? Is there some test for this ? I guess there are some tweak-able parameters like multiplying this depth by some factor etc but again that is more of trial and error.I am looking for something more concrete.How do people in the vision community generally verify the results?

I suggest you verify the depth measured in your images by measuring it in the real world. If there was a way to verify the measurement you did in the images .. in the images then you probably would have used that way to measure depth in the first place.
Measure the distance from your camera to some object in the real world, and measure the size of the object perpendicular to the axis of one of the camera's. Then also measure the distance and size in your images. You use the size measure in the real world combined with the size of the object in pixels in the image to scale the distance you calculate. The result should match the distance you mesaured.

Related

Is it possible to find camera position using 8-10 non-coplanar points, if their 3D coordinates are unknown?

I have a set of non-coplanar points with unknown 3D position (I am not limited with points number, let's say 8-10 of them), and at least 3 different views (number of views also not limited) of these points in 2D images. I have also estimation for rotation and scale for every point set on pictures that corresponds to real points, also an estimation of the euclidean distance between every two camera positions that images were taken at.
Is this data enough to find camera pose after taking another additional picture with these points (to find as precisely as possible)? If not, what are minimal additional data need to have to achieve this?
UPDATE: In this specific case I needed the function recoverPose() from calib3d module
Yes, this is possible. Depending on the algorithms (and the availability of some pre-calibration), you can obtain the relative positions of two cameras using a minimum of 5 to 8 points.
Beware that the point correspondences must be available, i.e. the points must be known in pairs.

Is it possible to get (force) absolute accuracy on AVDepthData from IPhoneX camera?

I need to get distance from the camera to points in the camera image with AVDepthData. I understand there are two kinds of accuracy associated to AVDepthData: relative and absolute, the latter being the one which corresponds to real life distance.
I cannot seem to generate an AVDepthData with absolute accuracy. Is it possible at all?
AVDepthData is a generic model object for representing depth maps from a variety of possible sources, including parallax-based disparity inference, time-of-flight-based depth inference, data recorded by third-party cameras, or data synthesized by a 3D rendering engine. Thus, it can represent and describe more types of data than the device you're currently using can capture.
(It's like having an image format that supports 10-bit-per-component color: just because UIImage or some other API can tell you it's holding a wide-color image doesn't mean you have a camera that captures such images.)
More specifically... you didn't say whether you're using the front or back camera on iPhone X, but that matters quite a bit to what kind of depth maps you can capture.
builtInDualCamera, which iPhone X has for the back-facing camera (as do iPhone 7/8 Plus), infers disparity — which is not quite the same as depth, but related — by analyzing the parallax offsets between two camera images. This technique doesn't produce absolute measurements of depth, but because disparity is inversely proportional to depth you can know which points are deeper than others. (And using the cameraCalibrationData you can do some math and maybe get some decent estimates of absolute depth.)
builtInTrueDepthCamera, which iPhone X (and so far only iPhone X) has for its front-facing camera, can measure disparity or depth with time-of-flight analysis. (And sharks with fricking laser beams!) This technique produces absolute measurements pretty well, as long as you can safely assume the speed of light.
Which technique is used determines what kind of measurement you can get, and which technique is used depends on the capture device you select. (And by the way, there's a wealth of information on how these techniques work in the WWDC17 talk on capturing depth.)
If you're looking for back-camera depth measurements in an absolute frame of reference, you might do better to look at ARKit — that's not going to get you accurate depth values for every pixel, because it depends on coarse scene reconstruction, but the distance values you can get are absolute.

How to take stereo images using single camera?

I want to find the depth map for stereo images.At present i am working on the internet image,I want to take stereo images so that i can work on it by my own.How to take best stereo images without much noise.I have single camera.IS it necessary to do rectification?How much distance must be kept between the cameras?
Not sure I've understood your problem correclty - will try anyway
I guess your currently working with images from middlebury or something similar. If you want to use similar algorithms you have to rectify your images because they are based on the assumption that corresponding pixels are on the same line in all images. If you actually want depth images (!= disparity images) you also need to get the camera extrinsics.
Your setup should have two cameras and you have to make sure that they don't change there relative position/orientation - otherwise your rectification will break apart. In the first step you have to calibrate your system to get intrinsic and extrinsic camera parameters. For that you can either use some tool or roll your own with (for example) OpenCV (calib-module). Print out a calibration board to calibrate your system. Afterwards you can take images and use the calibration to rectify the images.
Regarding color-noise:
You could make your aperture very small and use high exposure times. In my own opinion this is useless because real world situations have to deal with such things anyway.
In short, there are plenty of stereo images on the internet that are already rectified. If you want to take your own stereo images you have to follow these three steps:
The relationship between distance to the object z (mm) and disparity in pixels D is inverse: z=fb/D, where f is focal length in pixels and b is camera separation in mm. Select b such that you have at least several pixels of disparity;
If you know camera intrinsic matrix and compensated for radial distortions you still have to rectify your images in order to ensure that matches are located in the same row. For this you need to find a fundamental matrix, recover essential matrix, apply rectifying homographies and update your intrinsic camera parameters... or use stereo pairs from the Internet.
The low level of noise in the camera image is helped by brightly illuminated scenes, large aperture, large pixel size, etc.; however, depending on your set up you still can end up with a very noisy disparity map. The way to reduce this noise is to trade-off with accuracy and use larger correlation windows. Another way to clean up a disparity map is to use various validation techniques such as
error validation;
uniqueness validation or back-and-force validation
blob-noise supression, etc.
In my experience:
-I did the rectification, so I had to obtain the fundamental matrix, and this may not be correct with some image pairs.
-Better resolution of your camera is better for the matching, I use OpenCV and it has an implementation of BRISK descriptor, it was useful for me.
-Try to cover the same area and try not to do unnecessary rotations.
-Once you understand the Theory, OpenCV is a good friend. Here is some result, but I am still working on it:
Depth map:
Rectified images:

Triangulation of Rectified Image Points in Multiple Views

I am working with a set of calibrated images that form a ring around a foreground object (1). I used Fusiello's method (1) to rectify adjacent pairs of images, and then I performed disparity estimation.
When I take the matched points from a stereo pair and triangulate them, it forms an accurate point cloud. Unfortunately, when I triangulate the points from another stereo image pair, this point cloud never aligns correctly with the original cloud.
Should calibrated, rectified images' point clouds merge together automatically?
Thanks in advance for any help you can offer.
This might be due to the accuracy of calibration - both intrinsic (i.e. the same camera model - and how it handles distortion) and extrinsic (i.e. the camera pose in real space). Together, of course, these dictate the ultimate accuracy of your re-projection.
Do you have a measure of error for camera calibration - in terms of MSE re-projection?
Cumulative error is often noticeable in my experience if simply iterating over subsequent images. Some form of global optimisation often needs to be performed to first correct positions for all the camera poses.
The accuracy of your disparity estimation is also a factor. Not only in terms of the algorithm you using, but also in relation to the stereo baseline and how it relates to the size/nature of the object in question (how concave/convex), and how many sampling of the images you are taking (and the quality of those images - exposure/depth-of-field/etc).
Fundamentally, just how "off" are your point clouds? Are they close to being aligned (you could do a bit of ICP before triangulation...). Are they closer in the "centre" of the re-projection? Are they worse for projections taken from opposing images on opposite sides of the object?
Remember as well that (due to the discrete sampling) you shouldn't expect points to ever be re-projected exactly "on-top" on one another. Some form of binning operation during the triangulation pipeline usually occurs for handling this (hence most of the research work in visual hull -> voxels -> marching cubes -> triangulated surface around this...)
Have you checked out MeshLab BTW?

How would you find the height of objects given an image?

This isn't exactly a programming question exactly. I just want to know what your approach would be to a common problem in Digital image processing.
Let's say you have an image of a few trees in say jpg format. How would you go about finding the heights of each of these trees? The photo is the only input you have.
I want to know the approaches you have not to code. So it doesn't matter if your answers are vague, or non DIP-ish.
Small correction :
The height need not be the actual height of the tree. The height can be taken to any scale. But should be consistent to all objects in the pic.
Yes it is possible. What you are describing has an entire industry around it, called Photogrammetry
There is a fair amount of computer vision research in this area. Assuming you don't know the camera constraints, you'll have to make assumptions about the scene and camera to determine the heights up to a scale factor. Note that without camera constraints or a reference height in the image it is impossible to tell the difference between a tall tree photographed from a distance or a short tree photographed up close. A great start is the Single View Metrology work by Criminisi.
It is simple to find the size of an object from images using Photogrammetry.
Photogrammetry is the science of making measurements from photographs.
For this we need to know two things,
the distance between the camera and the image plane(distance from camera to object).
Focal-length(in mm and pixels per mm) or physical size of the image sensor.
Following are the steps:
Calibrate the Camera
Use openCV to calibrate the camera.You can use the OpenCV calibrate.py tool and the Chessboard pattern PNG provided in the source code to generate a calibration matrix. Camera calibration is done to find the camera parameters. I took about a dozen of photos of the chessboard photos from many angles as I could with my webcam (to calibrate my webcam). For more details check openCV camera calibration.
We will get f_x,f_y,c_x,c_y from calibration matrix.
Checking the details of the photos you took, you will find the native resolution of the photos(heightXwidth) and in their EXIF headers you can find the focal length value(f). These items may vary depending on your camera.
Pixels per millimeter
We need to know the pixels per millimeter(px/mm) on the image sensor.
f_x=f*m_x
f_y=f*m_y
Since we have two of the variables for each formula we can solve for m_x and m_y.I just averaged f_x and f_y to get f_xy.
m=f_xy/focal_length_of_camera
Insert the image
Insert your image from which you need to find the actual size of image. You should know the distance between object and camera. Find the dimension of the image (height1Xwidth1)
Find the Object size in pixels
Determine the size of object in pixels. I simply use distance formula to find length of a selected line. You can adopt any other method.
Convert px/mm in the lower resolution
pxpermm_in_lower_resolution = (width1*m)/width
Size of object in the image sensor
size_of_object_in_image_sensor = object_size_in_pixels/(pxpermm_in_lower_resolution)
Actual size of object
The actual size of object can be found with the above data as,
real_size = (dist*size_of_object_in_image_sensor)/focal_length
Assuming they're all the same distance away, all to scale, you'd want to find a single unit of measurement you can guarantee. For example, if there's a person in the photo, again, same scale, and you know they're exactly 6 feet tall, you use that as your measure. You then take that, and count how many stacked make the tree. For example, if you need 3.5 of this person, then:
3.5 * 6 = 21
gives you a 21 foot tall tree.
Without a single point of reference for everything, or if they're all on different scales, you would need a lot more information than you could easily get without having been there.
I would rely on an object of known dimensions to be present in the picture. For instance, a man.
Or perhaps, we could use the EXIF data to reverse engineer the size of the object based on the camera's sensor dimensions, the lens and the focal length used. This again depends on the angle. We should be getting most accurate results when the camera has been held perpendicular to the subject.
If your image is 3*3 and you want to find out the size of image (i.e 3x3..so 3x3 = 9) now we have 8 pixels starting from 0 up to 8. So 9/8=(___)kb.
If you want to find the size of image in MB, like doing above example, just do like that (9/8)/(1024)=(----)MB..
So you will get the result in Mb.

Resources