What is the definition of a "disparity map"? - image-processing

I've been asked to implement an edge-based disparity map, but I fundamentally don't understand what a disparity map is. What is the definition of a "disparity map"?

Disparity map refers to the apparent pixel difference or motion between a pair of stereo images. To experience this, try closing one of your eyes and then rapidly close it while opening the other. Objects that are close to you will appear to jump a significant distance while objects further away will move very little. That motion is the disparity.
In a pair of images derived from stereo cameras, you can measure the apparent motion in pixels for every point and make an intensity image out of the measurements.
See this for an example. You can see the objects in the foreground are brighter, denoting greater motion and lesser distance.

Related

How does Eye tracking calibration helps the gaze estimation?

I am asked to create calibration for the eye-tracking algorithm. However, I still don't really understand about how does the calibration helps in making our gaze estimation more accurate, as well as how calibration in eye-tracking actually works. I have read https://www.tobiidynavox.com/support-training/eye-tracker-calibration/, as well as https://developer.tobii.com/community/forums/topic/explain-calibration/, but I still don't fully understand it. I will appreciate if somebody can explain it to me.
Thank you
In the answer below, I assume that you are referring to standard pupil-centre corneal-reflection video-oculography rather than any other form of eye tracking technology.
In eye tracking, calibration is the process that transforms the coordinates of features located in a two dimensional still video frame of the eye into gaze coordinates (i.e. coordinates that are related to the world being observed). For example, let's say your eye tracker produces a 400 × 400 pixel image of the eye, and the subject is looking at a screen that is 1024 × 768 pixels in size, some distance in front of them. The calibration process needs to relate coordinates in the eye image to where the person is looking (i.e. gazing) at on the display screen. This process is not trivial: just because the pupil is centred in the eye image does not mean that the person is looking at the centre of the display in the world, for example. And the position of the pupil centre could move within the eye image even though the direction of gaze is held constant in the world. This is why we track the centre of the pupil and the corneal reflection, as the vector linking the two is robust to translation of the eye within the image that occurs in the absence of a gaze rotation.
A standard way to do this mapping is via relatively straightforward 2D non-linear regression: you move a target at known coordinates on the display and ask the participant to fixate steadily on each, while recording the location of the pupil centre and corneal reflection in the eye image. The calibration process will map the vector linking the pupil centre and the corneal reflection to the corresponding known gaze coordinates. This produces a regression solution that allows you to map intermediate locations to their interpolated gaze coordinates.
(An alternative, or supplementary, approach is model-based rather than regression-based, but let's not go there right now.)
So in essence, calibration doesn't improve gaze estimation, it provides gaze estimation. Without first doing a calibration, all you are doing is tracking the movements of features (the pupil and corneal reflection) within a relatively arbitrary image of the eye. Until calibration is carried out, you have no idea at that stage where that eye is actually pointing in the world.
Having said all that, this is not at all a coding-based question (or answer), so not actually sure that StackOverflow is the ideal venue to be asking this.

3D reconstruction using stereo camera

I try to construct 3D point cloud and measure real sizes or distances of objects using stereo camera. The cameras are stereo calibrated, and I find 3D points using reprojection matrix Q and disparity.
My problem is the calculated sizes are changing depending the distance from cameras. I calculate the distances between two 3D points, it has to be constant, but when object gets closer to the camera, distance increasing.
Am i missing something? The 3D coordinates have to be in camera coordinates, not in pixel coordinates. So it seems inaccurate to me. Any idea?
You didn't mention how far apart your cameras are - the baseline. If they are very close together compared with the distance of the point that you are measuring, a slight inaccuracy in your measurement can lead to a big difference in the computed distance.
One way you can check if this is the problem is by testing with only lateral movement of the camera.

How to get the coordinates of the moving object with GPUImage motionDetection- swift

How would I go about getting the screen coordinates of something that enters frame with motionDetection filter? I'm fairly new to programming, and would prefer a swift answer if possible.
Example - I have the iphone pointing at a wall - monitoring it with the motionDetector. If I bounce a tennis ball against the wall - I want the app to place an image of a tennis ball on the iphone display at the same spot it hit the wall.
To do this, I would need the coordinates of where the motion occurred.
I thought maybe the "centroid" argument did this.... but I'm not sure.
I should point out that the motion detector is pretty crude. It works by taking a low-pass filtered version of the video stream (a composite image generated by a weighted average of incoming video frames) and then subtracting that from the current video frame. Pixels that differ above a certain threshold are marked. The number of these pixels, along with the centroid of the marked pixels are provided as a result.
The centroid is a normalized (0.0-1.0) coordinate representing the centroid of all of these differing pixels. A normalized strength gives you the percentage of the pixels that were marked as differing.
Movement in a scene will generally cause a bunch of pixels to differ, and for a single moving object the centroid will generally be the center of that object. However, it's not a reliable measure, as lighting changes, shadows, other moving objects, etc. can also cause pixels to differ.
For true object tracking, you'll want to use feature detection and tracking algorithms. Unfortunately, the framework as published does not have a fully implemented version of any of these at present.

OpenCV - calibrate camera using static images in water

I have a photocamera mounted vertically under water in a tank, looking downwards.
There is a flat grid on the bottom of the tank (approx 2m away from the camera).
I want to be able to place markers on the bottom, and use computer vision to know their real life exact position.
So, I need to map from pixels to mm.
If I am not mistaken, cv::calibrateCamera(...) does just this, but is dependent on moving a pattern in front of the camera.
I have just static pictures of the scene, and the camera never moves in relation to the grid. Thus, I have only a "single" image to find the parameters.
How can I do this using the grid?
Thank you.
Interesting problem! The "cute" part is the effect on the intrinsic parameters of the refraction at the water-glass interface, namely to increase the focal length (or, conversely, to reduce the field of view) compared to the same lens in air. In theory, you could calibrate in air and then correct for the difference in refraction index, but calibrating directly in water is likely to give you more accurate results.
Do know your accuracy requirements? And have you verified that your lens/sensor combination is adequate to meet them (with an adequate margin)? To answer the question you need to estimate (either by calculation from the lens and sensor specifications, or experimentally using a resolution chart) whether you can resolve in an image the minimal distances required by your application.
From the wording of your question I think that you are interested only in measurements on a single plane. So you only need to (a) remove the nonlinear (barrel or pincushion) lens distortion and (b) estimate the homography between the plane of interest and the image. Once you have the latter, you can directly convert from undistorted image coordinates to world ones by matrix multiplication. Additionally if (as I imagine) the plane of interest is roughly parallel to the image plane, you should not have any problem keeping the entire field-of-view in focus.
Of course, for all of this to work as expected, you should make sure that the tank bottom is really flat, within the measurement tolerances of your application. Otherwise you are really dealing with a 3D problem, and need to modify your procedures accordingly.
The actual procedure depends a lot on the size of the tank, which you don't indicate clearly. If it's small enough that it is practical to manufacture a chessboard-like movable calibration target, by all means go for it. You may want to take a look at this other answer for suggestions. In the following I'll discuss the more interesting case in which your tank is large, e.g. the size of a swimming pool.
I'd proceed by sticking calibration markers in a regular grid at the pool bottom. I'd probably choose checker-like markers like these, maybe printing them myself with a good laser printer on plastic with an adhesive backing (assuming you can leave them in place forever). You should plan on having quite a few of them, say, an 8x8 or 10x10 grid, covering as much as possible of the field of view of the camera in its operating position and pose. To help with lining up the grid nicely you might use a laser line projector of suitable fan angle, or a laser pointer attached to a rotating support. Note carefully that it is not necessary that they be affixed in a precise X-Y grid (which may be complicated, depending on the size of your pool), only that their positions with respect any arbitrarily chosen (but fixed) three of them be known. In other words, you can attach them to the bottom approximately in a grid, then measure the distances of three extreme corners from each other as accurately as you can, thus building a base triangle, then measure the distances of all the other corners from the vertices of the triangle, and finally reconstruct their true positions with a bit of trigonometry. It's basically a surveying problem and, depending on your accuracy requirements and budget, you may want to enroll a local friendly professional surveyor (and their tools) to get it done as precisely as necessary.
Once you have your grid, you can fill the pool, get your camera, focus and f-stop the lens as needed for the application. From now on you may not touch the focus and f-stop ever again, under penalty of miscalibrating - exposure can only be controlled by the exposure time, so make sure to have enough light. Disable any and all auto-focus and auto-iris functions, if any. If the camera has a non-rigid lens mount (e.g. a DLSR), you'll need some kind of mechanical rig to ensure that the lens-body pair stay rigid. F-stop as close as you can, given the available lighting and sensor, so to have a fair bit of depth of field available. Then take several photos (~ 10) of the grid, moving and rotating the camera, and going a bit closer and farther away than your expected operating distance from the plane. You'll want to "see" in some images some significant perspective foreshortening of the grid - this is needed to accurately calibrate the focal length. Avoid JPG and any other lossy compression format when storing the images - use lossless PNG or TIFF.
Once you have the images, you can manually mark and identify the checker markers in the images. For a once-off project like this I would not bother with automatic identification, just do it manually (e.g. in Matlab, or even in Photoshop or Gimp). To help identify the markers, you could, e.g. print a number next to them. Once you have the manual marks, you can refine them automatically to subpixel accuracy, e.g. using cv::findCornerSubpix.
You're almost done. Feed the "reference" measured position of the real corners, and the observed ones in all images, to your favorite camera calibration routine, e.g. cv::calibrateCamera. You use the nominal focal length of the camera (converted to pixels) for an initial estimate, along with null distortion. If all goes well, you will obtain the camera intrinsic parameters, which you will keep, and the camera poses at all images, which you'll throw away.
Now you can mount the camera in your final setup, as needed by your application, and take one further image of the grid. Mark and refine the corner positions as before. Undistort their image positions using the distortion parameters returned by the calibration. Finally compute the homography between the reference positions of the real markers (in meters) and their undistorted positions, and you're done.
HTH
To calibrate the camera you do need multiple images of the checkerboard (or one of the other patterns found here). What you can do, is calibrate the camera outside of the water or do a calibration sequence once.
Once you have that information (focal length, center of lens, distortion, etc). You can use the solvePNP function to estimate the orientation of a single board. This estimation provides you with a distance from the camera to the board.
A completely different alternative could be to find what kind of lens the camera uses and manually fill in the data. I've not tried this, so I'm uncertain how well this would work.

Triangulation of Rectified Image Points in Multiple Views

I am working with a set of calibrated images that form a ring around a foreground object (1). I used Fusiello's method (1) to rectify adjacent pairs of images, and then I performed disparity estimation.
When I take the matched points from a stereo pair and triangulate them, it forms an accurate point cloud. Unfortunately, when I triangulate the points from another stereo image pair, this point cloud never aligns correctly with the original cloud.
Should calibrated, rectified images' point clouds merge together automatically?
Thanks in advance for any help you can offer.
This might be due to the accuracy of calibration - both intrinsic (i.e. the same camera model - and how it handles distortion) and extrinsic (i.e. the camera pose in real space). Together, of course, these dictate the ultimate accuracy of your re-projection.
Do you have a measure of error for camera calibration - in terms of MSE re-projection?
Cumulative error is often noticeable in my experience if simply iterating over subsequent images. Some form of global optimisation often needs to be performed to first correct positions for all the camera poses.
The accuracy of your disparity estimation is also a factor. Not only in terms of the algorithm you using, but also in relation to the stereo baseline and how it relates to the size/nature of the object in question (how concave/convex), and how many sampling of the images you are taking (and the quality of those images - exposure/depth-of-field/etc).
Fundamentally, just how "off" are your point clouds? Are they close to being aligned (you could do a bit of ICP before triangulation...). Are they closer in the "centre" of the re-projection? Are they worse for projections taken from opposing images on opposite sides of the object?
Remember as well that (due to the discrete sampling) you shouldn't expect points to ever be re-projected exactly "on-top" on one another. Some form of binning operation during the triangulation pipeline usually occurs for handling this (hence most of the research work in visual hull -> voxels -> marching cubes -> triangulated surface around this...)
Have you checked out MeshLab BTW?

Resources