Can RANSAC be improved to remove outliers? - opencv

I am using SIFT feature detector and descriptor. I am matching the points between two images. I am using findHomography() function of OpenCV with the RANSAC method.
When I read about the RANSAC algorithm, it is said that adjusting a threshold parameter for RANSAC can improve the results. But I don't want to hardcode any parameter.
I know RANSAC is removing outliers in matches. Can anyone let me know if removing outliers (not all of them) with basic methods before applying homography improves the result of homography?
If so, how can we apply an operation before RANSAC to remove outliers?

What is your definition of a good result? RANSAC is about a tradeoff between the number of points and their precision, so there is no uniform definition of good: you have more inliers if their accuracy is worse and vice versa.
The parameter you are talking about is probably an outlier threshold and it may be just badly tuned so you have too many approximate inliers or too few super accurate inliers. Now, if you pre-filter your outliers you will just speed up your RANSAC but unlikely to improve the solution. Eventually the speed of RANSAC with Homography boils down to the probability of selecting 4 inliers and when their proportion is higher the convergence is more speedy.
The other methods to sort out outliers before applying RANSAC is to look at simpler constraints such as ordering of points, straight lines still being straight lines, cross-ratio and other invariants of Homography transformation. Finally you may want to use higher level features such as lines to calculate homography. Note, that in homogeneous coordinates when points are transformed as p2=H*p1, the lines are transformed as l2 = H-t * l1. This can actually increase the accuracy (since lines are macro features and are less noisy than point of interests) while straight lines can be detected via a Hough transform.

No, the whole point of RANSAC and related algorithms is to remove outliers.
However, it is possible to refine the algorithm in ways that avoid the definition of somewhat arbitrary threshold.
A good starting point is Torr's old MLESAC paper

Related

Measure to separate positive & negative examples using SIFT keypoint object detection?

I used SIFT keypoint descriptors for detecting objects in an image. For that, I used best matches and calculated homography matrix.
Using this homography matrix, I found where the object lies in test image.
Now, for samples where object could not be found which has to be checked manually, what could be the measure which can help to distinguish between negative and positive samples.
Presently, using determinant of homography matrix we are separating the samples. Is there a better measure ?
You may use the number of point correspondences(filtered) as a measure which can help to distinguish between negative and positive samples.
Because positive samples always have much more point correspondences than negative samples.

How to evaluate distortion correction algorithm

i am trying to evaluate distortion correction by line fitting algorithm. Now i want to make a decision whether the given image is corrected or not. So should i consider RMSE for overall image, because i get RMSE for each line in image. Please suggest me how to make decision.
Afraid you are doing it all wrong, sorry. The mantra that "a well calibrated camera maps straight lines in the world to straight lines in the image", while true, does not lend itself to a well-posed definition of a metric for the quality of your calibration. You can compute an RMSE on straight lines in various ways, but they are all unprincipled hacks.
You can only define the RMSE error for the entire model of the projection from 3D points to their image. In other words, it only makes sense to speak of RMSE when you are doing bundle adjustment, solving jointly for the pose of the camera, and the linear and nonlinear intrinsic parameters of the lens. This is what you do when you calibrate a camera, or solve a structure-from-motion problem by bundle adjustment.
While it is theoretically true that a perfect estimation of the nonlinear lens distortion parameters "straightens" lines perfectly, it is quite tricky to use just this fact in order to define a metric for the quality of practically estimated distortion parameters. There are several reasons for this, among them:
When you apply a least-squares straight-line fitting algorithm on points obtained by un-distorting with erroneous parameters you are are using a wrong model. Applying the un-distortion function itself to the image of a physical straight 3D line produces a curve in the image which is only straight if the parameters are well estimated. When they aren't, your line fit will be biased, which means that distance between the curve and the straight line it "fits" is not a purely random variable: it depends on where you measure it, and where the curve itself is located in the image.
It is tricky to define a distance between a curve and a straight line that supposedly fits it. How do you choose which point on the line corresponds to a given point on the curve, or viceversa?
A more principled approach would be to define an error measure based on the geometrical curvature of the curve resulting by un-distorting the image of a physical straight line. However, attempting to accurately measure curvature opens a can of worms by itself, since it amounts to estimating (explicitly or not) the first and second derivatives of the curve, which amplifies noise.
So, all in all, either one of your suggestion "works" in the sense that it gives you a number that, if small, is "suggestive" of likely good calibration. However, neither is a "correct" choice, because the basis of what you are trying to measure (and define an error for) is shaky.

Given a feature vector, how to find whether my data points are linearly separable

I have a feature vector in matrix notation, and I have data points in 2D plane. How to find whether my data points are linearly separable with that feature vector?
One can check whether there exists a line divides the data points into two. If there isn't a line, how to check for linear separability in higher dimensions?
A theoretical answer
If we assume the samples of the two classes are distributed according to a Gaussian, we will get a quadratic function describing the decision boundary in the general case.
If the covariance matrices are identical we get a linear decision boundary.
A practical answer
See the SO discussion here.

How to improve the homography accuracy?

I used OpenCV's cv::findHomography API to calculate the homography matrix of two planar images.
The matched key points are extracted by SIFT and matched by BFMatcher. As I know, cv:findHomography use RANSAC iteration to find out the best four corresponding points to get the homography matrix.
So I draw the selected four pairs of points with the calculated contour using homograhy matrix of the edge of the object.
The result are as the links:
https://postimg.cc/image/5igwvfrx9/
As we can see, the selected matched points by RANSAC are correct, but the contour shows that the homography is not accurate.
But these test shows that, both the selected matched points and the homography are correct:
https://postimg.cc/image/dvjnvtm53/
My guess is that if the selected matched points are too close, the small error of the pixel position will lead to the significant error of the homography matrix. If the four points are in the corner of the image, then the shift of the matched points by 4-6 pixels still got good homography matrix.
(According the homogenous coordinate, I think it is reasonable, as the small error in the near plane will be amplified in the far away)
My question is:
1.Is my guess right?
2.Since the four matched points are generated by the RANSAC iteration, the overall error of all the keypoints are minimal. But How to get the stable homography, at least making the contour's mapping is correct? The theory proved that if the four corresponding points in a plane are found, the homography matrix should be calculated, but is there any trick in the engineer work?
I think you're right, and the proximity of the 4 points does not help the accuracy of the result. What you observe is maybe induced by numerical issues: the result may be locally correct for these 4 points but becomes worse when going further.
However, RANSAC will not help you here. The reason is simple: RANSAC is a robust estimation procedure that was designed to find the best point pairs among many correspondences (including some wrong ones). Then, in the inner loop of the RANSAC, a standard homography estimation is performed.
You can see RANSAC as a way to reject wrong point correspondences that would provoke a bad result.
Back to your problem:
What you really need is to have more points. In your examples, you use only 4 point correspondences, which is just enough to estimate an homography.
You will improve your result by providing more matches all over the target image. The problem then becomes over-determined, but a least squares solution can still be found by OpenCV. Furthermore, of there is some error either in the point correspondence process or in some point localization, RANSAC will be able to select the best ones and still give you a reliable result.
If RANSAC results in overfitting on some 4 points (as it seems to be the case in your example), try to relax the constraint by increasing the ransacReprojThreshold parameter.
Alternatively, you can either:
use a different estimator (the robust median CV_LMEDS is a good choice if there are few matching errors)
or use RANSAC in a first step with a large reprojection error (to get a rough estimate) in order to detect the spurious matchings then use LMEDS on the correct ones.
Just to extend #sansuiso's answer, with which I agree:
If you provide around 100 correspondences to RANSAC, probably you are getting more than 4 inliers from cvFindHomography. Check the status output parameter.
To obtain a good homography, you should have many more than 4 correspondences (note that 4 correspondences gives you an homography always), which are well distributed around the image and which are not linear. You can actually use a minimum number of inliers to decide whether the homography obtained is good enough.
Note that RANSAC finds a set of points that are consistent, but the way it has to say that that set is the best one (the reprojection error) is a bit limited. There is a RANSAC-like method, called MSAC, that uses a slightly different error measurement, check it out.
The bad news, in my experience, is that it is little likely to obtain a 100% precision homography most of the times. If you have several similar frames, it is possible that you see that homography changes a little between them.
There are tricks to improve this. For example, after obtaining a homography with RANSAC, you can use it to project your model into the image, and look for new correspondences, so you can find another homography that should be more accurate.
Your target has a lot of symmetric and similar elements. As other people mentioned (and you clarified later) the point spacing and point number can be a problem. Another problem is that SIFT is not designed to deal with significant perspective distortions that are present in your case. Try to track your object through smaller rotations and as was mentioned reproject it using the latest homography to make it look as close as possible to the original. This will also allow you to skip processing heavy SIFT and to use something as lightweight as FAST with cross correlation of image patches for matching.
You also may eventually come to understanding that using points is not enough. You have to use all that you got and this means lines or conics. If a homography transforms a point Pb = H* Pa it is easy to verify that in homogeneous coordinates line Lb = Henv.transposed * La. this directly follows from the equation La’.Pa = 0 = La’ * Hinv * H * Pa = La’ * Hinv * Pb = Lb’.Pb
The possible min. configurations is 1 line and three points or three lines and one point. Two lines and two points doesn’t work. You can use four lines or four points as well. Of course this means that you cannot use the openCV function anymore and has to write your own DLT and then non-linear optimization.

How is a homography calculated?

I am having quite a bit of trouble understanding the workings of plane to plane homography. In particular I would like to know how the opencv method works.
Is it like ray tracing? How does a homogeneous coordinate differ from a scale*vector?
Everything I read talks like you already know what they're talking about, so it's hard to grasp!
Googling homography estimation returns this as the first link (at least to me):
http://cseweb.ucsd.edu/classes/wi07/cse252a/homography_estimation/homography_estimation.pdf. And definitely this is a poor description and a lot has been omitted. If you want to learn these concepts reading a good book like Multiple View Geometry in Computer Vision would be far better than reading some short articles. Often these short articles have several serious mistakes, so be careful.
In short, a cost function is defined and the parameters (the elements of the homography matrix) that minimize this cost function are the answer we are looking for. A meaningful cost function is geometric, that is, it has a geometric interpretation. For the homography case, we want to find H such that by transforming points from one image to the other the distance between all the points and their correspondences be minimum. This geometric function is nonlinear, that means: 1-an iterative method should be used to solve it, in general, 2-an initial starting point is required for the iterative method. Here, algebraic cost functions enter. These cost functions have no meaningful/geometric interpretation. Often designing them is more of an art, and for a problem usually you can find several algebraic cost functions with different properties. The benefit of algebraic costs is that they lead to linear optimization problems, hence a closed form solution for them exists (that is a one shot /non-iterative method). But the downside is that the found solution is not optimal. Therefore, the general approach is to first optimize an algebraic cost and then use the found solution as starting point for an iterative geometric optimization. Now if you google for these cost functions for homography you will find how usually these are defined.
In case you want to know what method is used in OpenCV simply need to have a look at the code:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L81
This is the algebraic function, DLT, defined in the mentioned book, if you google homography DLT should find some relevant documents. And then here:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L165
An iterative procedure minimizes the geometric cost function.It seems the Gauss-Newton method is implemented:
http://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
All the above discussion assumes you have correspondences between two images. If some points are matched to incorrect points in the other image, then you have got outliers, and the results of the mentioned methods would be completely off. Robust (against outliers) methods enter here. OpenCV gives you two options: 1.RANSAC 2.LMeDS. Google is your friend here.
Hope that helps.
To answer your question we need to address 4 different questions:
1. Define homography.
2. See what happens when noise or outliers are present.
3. Find an approximate solution.
4. Refine it.
Homography in a 3x3 matrix that maps 2D points. The mapping is linear in homogeneous coordinates: [x2, y2, 1]’ ~ H * [x1, y1, 1]’, where ‘ means transpose (to write column vectors as rows) and ~ means that the mapping is up to scale. It is easier to see in Cartesian coordinates (multiplying nominator and denominator by the same factor doesn’t change the result)
x2 = (h11*x1 + h12*y1 + h13)/(h31*x1 + h32*y1 + h33)
y2 = (h21*x1 + h22*y1 + h23)/(h31*x1 + h32*y1 + h33)
You can see that in Cartesian coordinates the mapping is non-linear, but for now just keep this in mind.
We can easily solve a former set of linear equations in Homogeneous coordinates using least squares linear algebra methods (see DLT - Direct Linear Transform) but this unfortunately only minimizes an algebraic error in homography parameters. People care more about another kind of error - namely the error that shifts points around in Cartesian coordinate systems. If there is no noise and no outliers two erros can be identical. However the presence of noise requires us to minimize the residuals in Cartesian coordinates (residuals are just squared differences between the left and right sides of Cartesian equations). On top of that, a presence of outliers requires us to use a Robust method such as RANSAC. It selects the best set of inliers and rejects a few outliers to make sure they don’t contaminate our solution.
Since RANSAC finds correct inliers by random trial and error method over many iterations we need a really fast way to compute homography and this would be a linear approximation that minimizes parameters' error (wrong metrics) but otherwise is close enough to the final solution (that minimizes squared point coordinate residuals - a right metrics). We use a linear solution as a guess for further non-linear optimization;
The final step is to use our initial guess (solution of linear system that minimized Homography parameters) in solving non-linear equations (that minimize a sum of squared pixel errors). The reason to use squared residuals instead of their absolute values, for example, is because in Gaussian formula (describes noise) we have a squared exponent exp(x-mu)^2, so (skipping some probability formulas) maximum likelihood solutions requires squared residuals.
In order to perform a non-linear optimization one typically employs a Levenberg-Marquardt method. But in the first approximation one can just use a gradient descent (note that gradient points uphill but we are looking for a minimum thus we go against it, hence a minus sign below). In a nutshell, we go through a set of iterations 1..t..N selecting homography parameters at iteration t as param(t) = param(t-1) - k * gradient, where gradient = d_cost/d_param.
Bonus material: to further minimize the noise in your homography you can try a few tricks: reduce a search space for points (start tracking your points); use different features (lines, conics, etc. that are also transformed by homography but possibly have a higher SNR); reject impossible homographs to speed up RANSAC (e.g. those that correspond to ‘impossible’ point movements); use low pass filter for small changes in Homographies that may be attributed to noise.

Resources