I have a project, that aims for detecting distance to particular object(e.g traffic signs).
I have calivrated stereo-rig, and first thing I did was to find disparity image and then depth. However, since I need only distance to particular objects in the scene, I thought, that calculating disparity map is pretty long and heavy task, so I switched to feature detection method.
The idea here is following: I find similar features on both images, and then find disparity(just substract one feature point from another matched) only in the bboxes specified(i have attached the image).
The feature detector works correctly, however when I convert this disparities to actual depth, I have bad results, with a huge error. I convert them with following formula:
disparity = feature_matched1.x - feature_matched2.x
depth = baseline * focal / disparity.
The caluibration parameters seems to be correct and not the issue.
I want to ask, if I do this thing properly and is is possible to find depth? Maybe I have discoreved some false assumptions and I can not find depth like this method.
I can provide code, if it is necessary, however, I think it is more approach-related question.
Related
The traditional solution for high resolution images examples :
extract features (dense) for all images
match features to find tracks through images
triangulate features to 3d points.
I can give two problem here for my case (many 640*480 images with small movements between each others) , first: matching is very slow , especially if the number of images is big, so a better solution can be optical flow tracking.., but it's getting sparse with big moves, ( a mix could solve the problem !!)
second: triangulate tracks , though it is over-determined problem, I find it hard to code a solution, .. (here am asking for simplifying what I read in references )
I searched quite a bit for libraries in that direction, with no useful result.
again, I have ground truth camera matrices and need only 3d positions as first estimate (without BA),
A coded software solution can be of great help as I don't need to reinvent the wheel, though a detailed instructions maybe helpful
this basically shows the underlying geometry for estimating the depth.
As you said, we have camera pose Q, and we are picking a point X from world, X_L is it's projection on left image, now, with Q_L, Q_R and X_L, we are able to make up this green colored epipolar plane, the rest job is easy, we search through points on line (Q_L, X), this line exactly describe the depth of X_L, with different assumptions: X1, X2,..., we can get different projections on the right image
Now we compare the pixel intensity difference from X_L and the reprojected point on right image, just pick the smallest one and that corresponding depth is exactly what we want.
Pretty easy hey? Truth is it's way harder, image is never strictly convex:
This makes our matching extremely hard, since the non-convex function will result any distance function have multiple critical points (candidate matches), how do you decide which one is the correct one?
However, people proposed path based match to handle this problem, methods like: SAD, SSD, NCC, they are introduced to create the distance function as convex as possible, still, they are unable to handle large scale repeated texture problem and low texture problem.
To solve this, people start to search over a long range in the epipolar line, and suddenly found that we can describe this whole distribution of matching metrics into a distance along the depth.
The horizontal axis is depth, and the vertical axis is matching metric score, and this illustration lead us found the depth filter, and we usually describe this distribution with gaussian, aka, gaussian depth filter, and use this filter to discribe the uncertainty of depth, combined with the patch matching method, we can roughly get a proposal.
Now what, let's use some optimization tools, like GN or gradient descent to finally refine the depth estimaiton.
To sum up, the total process of the depth estimation is like the following steps:
assume all depth in all pixel following a initial gaussian distribution
start search through epipolar line and reproject points into target frame
triangulate depth and calculate the uncertainty of the depth from depth filter
run 2 and 3 again to get a new depth distribution and merge with previous one, if they converged then break, ortherwise start again from 2.
I used OpenCV's cv::findHomography API to calculate the homography matrix of two planar images.
The matched key points are extracted by SIFT and matched by BFMatcher. As I know, cv:findHomography use RANSAC iteration to find out the best four corresponding points to get the homography matrix.
So I draw the selected four pairs of points with the calculated contour using homograhy matrix of the edge of the object.
The result are as the links:
https://postimg.cc/image/5igwvfrx9/
As we can see, the selected matched points by RANSAC are correct, but the contour shows that the homography is not accurate.
But these test shows that, both the selected matched points and the homography are correct:
https://postimg.cc/image/dvjnvtm53/
My guess is that if the selected matched points are too close, the small error of the pixel position will lead to the significant error of the homography matrix. If the four points are in the corner of the image, then the shift of the matched points by 4-6 pixels still got good homography matrix.
(According the homogenous coordinate, I think it is reasonable, as the small error in the near plane will be amplified in the far away)
My question is:
1.Is my guess right?
2.Since the four matched points are generated by the RANSAC iteration, the overall error of all the keypoints are minimal. But How to get the stable homography, at least making the contour's mapping is correct? The theory proved that if the four corresponding points in a plane are found, the homography matrix should be calculated, but is there any trick in the engineer work?
I think you're right, and the proximity of the 4 points does not help the accuracy of the result. What you observe is maybe induced by numerical issues: the result may be locally correct for these 4 points but becomes worse when going further.
However, RANSAC will not help you here. The reason is simple: RANSAC is a robust estimation procedure that was designed to find the best point pairs among many correspondences (including some wrong ones). Then, in the inner loop of the RANSAC, a standard homography estimation is performed.
You can see RANSAC as a way to reject wrong point correspondences that would provoke a bad result.
Back to your problem:
What you really need is to have more points. In your examples, you use only 4 point correspondences, which is just enough to estimate an homography.
You will improve your result by providing more matches all over the target image. The problem then becomes over-determined, but a least squares solution can still be found by OpenCV. Furthermore, of there is some error either in the point correspondence process or in some point localization, RANSAC will be able to select the best ones and still give you a reliable result.
If RANSAC results in overfitting on some 4 points (as it seems to be the case in your example), try to relax the constraint by increasing the ransacReprojThreshold parameter.
Alternatively, you can either:
use a different estimator (the robust median CV_LMEDS is a good choice if there are few matching errors)
or use RANSAC in a first step with a large reprojection error (to get a rough estimate) in order to detect the spurious matchings then use LMEDS on the correct ones.
Just to extend #sansuiso's answer, with which I agree:
If you provide around 100 correspondences to RANSAC, probably you are getting more than 4 inliers from cvFindHomography. Check the status output parameter.
To obtain a good homography, you should have many more than 4 correspondences (note that 4 correspondences gives you an homography always), which are well distributed around the image and which are not linear. You can actually use a minimum number of inliers to decide whether the homography obtained is good enough.
Note that RANSAC finds a set of points that are consistent, but the way it has to say that that set is the best one (the reprojection error) is a bit limited. There is a RANSAC-like method, called MSAC, that uses a slightly different error measurement, check it out.
The bad news, in my experience, is that it is little likely to obtain a 100% precision homography most of the times. If you have several similar frames, it is possible that you see that homography changes a little between them.
There are tricks to improve this. For example, after obtaining a homography with RANSAC, you can use it to project your model into the image, and look for new correspondences, so you can find another homography that should be more accurate.
Your target has a lot of symmetric and similar elements. As other people mentioned (and you clarified later) the point spacing and point number can be a problem. Another problem is that SIFT is not designed to deal with significant perspective distortions that are present in your case. Try to track your object through smaller rotations and as was mentioned reproject it using the latest homography to make it look as close as possible to the original. This will also allow you to skip processing heavy SIFT and to use something as lightweight as FAST with cross correlation of image patches for matching.
You also may eventually come to understanding that using points is not enough. You have to use all that you got and this means lines or conics. If a homography transforms a point Pb = H* Pa it is easy to verify that in homogeneous coordinates line Lb = Henv.transposed * La. this directly follows from the equation La’.Pa = 0 = La’ * Hinv * H * Pa = La’ * Hinv * Pb = Lb’.Pb
The possible min. configurations is 1 line and three points or three lines and one point. Two lines and two points doesn’t work. You can use four lines or four points as well. Of course this means that you cannot use the openCV function anymore and has to write your own DLT and then non-linear optimization.
I am trying to estimate the pose and position of a satellite given an image of it. I have a 3D model of the satellite. Using either PnP solvers or POSIT works great when I pick out the point correspondences myself, however I need to to find a method to match the points up automatically. Using a corner detector (best one I found so far is based on the contour) I can find all the relevant points in the image in addition a few spurious points. However I need to match a given point in the image to the correct point in the 3D model. The articles I have read on the subject always seem to assume that we have found the point pairs without going into details about how to do so.
Is there any approach usually taken that can determine these correspondences based on some invariant features? Or should i resort to a different method not based on corner points?
You can have a look at the SoftPOSIT algorithm, which determines 3D-2D correspondences and then executes POSIT algorithm. As far as I know Matlab code is available for SoftPOSIT.
ou have to do PnP with RANSAC, see openCV code solvePnPRansac(). This method can tolerate a high percent of mismatches so you don't need to be precise with all your matches but just have a certain percent of correct ones (even as low as 30%). Of course the min number of right correspondences is 4.
Speaking of invariant features - if the amount of rotation between neighbouring frame is small you don't need to use invariant features. Even a small patch of with grey intensities would suffice to find a match. The only problem is that you have to update your descriptor or even choose a different feature point on your model depending on the model rotation. The latter may be hard to do since you have to know 3D coordinate of every feature.
I'm using findHomography on a list of points and sending the result to warpPerspective.
The problem is that sometimes the result is complete garbage and the resulting image is represented by weird gray rectangles.
How can I detect when findHomography sends me bad results?
There are several sanity tests you can perform on the output. On top of my head:
Compute the determinant of the homography, and see if it's too close to zero for comfort.
Even better, compute its SVD, and verify that the ratio of the first-to-last singular
value is sane (not too high). Either result will tell you whether the matrix is close to
singular.
Compute the images of the image corners and of its center (i.e. the points you get when
you apply the homography to those corners and center), and verify that they make sense,
i.e. are they inside the image canvas (if you expect them to be)? Are they well separated
from each other?
Plot in matlab/octave the output (data) points you fitted the homography to, along
with their computed values from the input ones, using the homography, and verify that they
are close (i.e. the error is low).
A common mistake that leads to garbage results is incorrect ordering of the lists of input and output points, that leads the fitting routine to work using wrong correspondences. Check that your indices are correct.
Understanding the degenerate homography cases is the key. You cannot get a good homography if your points are collinear or close to collinear, for example. Also, huge gray squares may indicate extreme scaling. Both cases may arise from the fact that there are very few inliers in your final homography calculation or the mapping is wrong.
To ensure that this never happens:
1. Make sure that points are well spread in both images.
2. Make sure that there are at least 10-30 correspondences (4 is enough if noise is small).
3. Make sure that points are correctly matched and the transformation is a homography.
To find bad homographies apply found H to your original points and see the separation from your expected points that is |x2-H*x1| < Tdist, where Tdist is your threshold for distance error. If there are only few points that satisfy this threshold your homography may be bad and you probably violated one of the above mentioned requirements.
But this depends on the point-correspondences you use to compute the homography...
Just think that you are trying to find a transformation that maps lines to lines (from one plane to another), so not any possible configuration of point-correspondences will give you an homography that creates nice images.
It is even possible that the homography maps some of the points to the infinity.
I'm using the EMGU OpenCV wrapper for c#. I've got a disparity map being created nicely. However for my specific application I only need the disparity values of very few pixels, and I need them in real time. The calculation is taking about 100 ms now, I imagine that by getting disparity for hundreds of pixel values rather than thousands things would speed up considerably. I don't know much about what's going on "under the hood" of the stereo solver code, is there a way to speed things up by only calculating the disparity for the pixels that I need?
First of all, you fail to mention what you are really trying to accomplish, and moreover, what algorithm you are using. E.g. StereoGC is a really slow (i.e. not real-time), but usually far more accurate) compared to both StereoSGBM and StereoBM. Those last two can be used real-time, providing a few conditions are met:
The size of the input images is reasonably small;
You are not using an extravagant set of parameters (for instance, a larger value for numberOfDisparities will increase computation time).
Don't expect miracles when it comes to accuracy though.
Apart from that, there is the issue of "just a few pixels". As far as I understand, the algorithms implemented in OpenCV usually rely on information from more than 1 pixel to determine the disparity value. E.g. it needs a neighborhood to detect which pixel from image A map to which pixel in image B. As a result, in general it is not possible to just discard every other pixel of the image (by the way, if you already know the locations in both images, you would not need the stereo methods at all). So unless you can discard a large border of your input images for which you know that you'll never find your pixels of interest there, I'd say the answer to this part of your question would be "no".
If you happen to know that your pixels of interest will always be within a certain rectangle of the input images, you can specify the input image ROIs (regions of interest) to this rectangle. Assuming OpenCV does not contain a bug here this should speedup the computation a little.
With a bit of googling you can to find real-time examples of finding stereo correspondences using EmguCV (or plain OpenCV) using the GPU on Youtube. Maybe this could help you.
Disclaimer: this may have been a more complete answer if your question contained more detail.