How is a homography calculated? - opencv

I am having quite a bit of trouble understanding the workings of plane to plane homography. In particular I would like to know how the opencv method works.
Is it like ray tracing? How does a homogeneous coordinate differ from a scale*vector?
Everything I read talks like you already know what they're talking about, so it's hard to grasp!

Googling homography estimation returns this as the first link (at least to me):
http://cseweb.ucsd.edu/classes/wi07/cse252a/homography_estimation/homography_estimation.pdf. And definitely this is a poor description and a lot has been omitted. If you want to learn these concepts reading a good book like Multiple View Geometry in Computer Vision would be far better than reading some short articles. Often these short articles have several serious mistakes, so be careful.
In short, a cost function is defined and the parameters (the elements of the homography matrix) that minimize this cost function are the answer we are looking for. A meaningful cost function is geometric, that is, it has a geometric interpretation. For the homography case, we want to find H such that by transforming points from one image to the other the distance between all the points and their correspondences be minimum. This geometric function is nonlinear, that means: 1-an iterative method should be used to solve it, in general, 2-an initial starting point is required for the iterative method. Here, algebraic cost functions enter. These cost functions have no meaningful/geometric interpretation. Often designing them is more of an art, and for a problem usually you can find several algebraic cost functions with different properties. The benefit of algebraic costs is that they lead to linear optimization problems, hence a closed form solution for them exists (that is a one shot /non-iterative method). But the downside is that the found solution is not optimal. Therefore, the general approach is to first optimize an algebraic cost and then use the found solution as starting point for an iterative geometric optimization. Now if you google for these cost functions for homography you will find how usually these are defined.
In case you want to know what method is used in OpenCV simply need to have a look at the code:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L81
This is the algebraic function, DLT, defined in the mentioned book, if you google homography DLT should find some relevant documents. And then here:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L165
An iterative procedure minimizes the geometric cost function.It seems the Gauss-Newton method is implemented:
http://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
All the above discussion assumes you have correspondences between two images. If some points are matched to incorrect points in the other image, then you have got outliers, and the results of the mentioned methods would be completely off. Robust (against outliers) methods enter here. OpenCV gives you two options: 1.RANSAC 2.LMeDS. Google is your friend here.
Hope that helps.

To answer your question we need to address 4 different questions:
1. Define homography.
2. See what happens when noise or outliers are present.
3. Find an approximate solution.
4. Refine it.
Homography in a 3x3 matrix that maps 2D points. The mapping is linear in homogeneous coordinates: [x2, y2, 1]’ ~ H * [x1, y1, 1]’, where ‘ means transpose (to write column vectors as rows) and ~ means that the mapping is up to scale. It is easier to see in Cartesian coordinates (multiplying nominator and denominator by the same factor doesn’t change the result)
x2 = (h11*x1 + h12*y1 + h13)/(h31*x1 + h32*y1 + h33)
y2 = (h21*x1 + h22*y1 + h23)/(h31*x1 + h32*y1 + h33)
You can see that in Cartesian coordinates the mapping is non-linear, but for now just keep this in mind.
We can easily solve a former set of linear equations in Homogeneous coordinates using least squares linear algebra methods (see DLT - Direct Linear Transform) but this unfortunately only minimizes an algebraic error in homography parameters. People care more about another kind of error - namely the error that shifts points around in Cartesian coordinate systems. If there is no noise and no outliers two erros can be identical. However the presence of noise requires us to minimize the residuals in Cartesian coordinates (residuals are just squared differences between the left and right sides of Cartesian equations). On top of that, a presence of outliers requires us to use a Robust method such as RANSAC. It selects the best set of inliers and rejects a few outliers to make sure they don’t contaminate our solution.
Since RANSAC finds correct inliers by random trial and error method over many iterations we need a really fast way to compute homography and this would be a linear approximation that minimizes parameters' error (wrong metrics) but otherwise is close enough to the final solution (that minimizes squared point coordinate residuals - a right metrics). We use a linear solution as a guess for further non-linear optimization;
The final step is to use our initial guess (solution of linear system that minimized Homography parameters) in solving non-linear equations (that minimize a sum of squared pixel errors). The reason to use squared residuals instead of their absolute values, for example, is because in Gaussian formula (describes noise) we have a squared exponent exp(x-mu)^2, so (skipping some probability formulas) maximum likelihood solutions requires squared residuals.
In order to perform a non-linear optimization one typically employs a Levenberg-Marquardt method. But in the first approximation one can just use a gradient descent (note that gradient points uphill but we are looking for a minimum thus we go against it, hence a minus sign below). In a nutshell, we go through a set of iterations 1..t..N selecting homography parameters at iteration t as param(t) = param(t-1) - k * gradient, where gradient = d_cost/d_param.
Bonus material: to further minimize the noise in your homography you can try a few tricks: reduce a search space for points (start tracking your points); use different features (lines, conics, etc. that are also transformed by homography but possibly have a higher SNR); reject impossible homographs to speed up RANSAC (e.g. those that correspond to ‘impossible’ point movements); use low pass filter for small changes in Homographies that may be attributed to noise.

Related

How to 3d reconstruct robustly from multiple images with known poses in OpenCV

The traditional solution for high resolution images examples :
extract features (dense) for all images
match features to find tracks through images
triangulate features to 3d points.
I can give two problem here for my case (many 640*480 images with small movements between each others) , first: matching is very slow , especially if the number of images is big, so a better solution can be optical flow tracking.., but it's getting sparse with big moves, ( a mix could solve the problem !!)
second: triangulate tracks , though it is over-determined problem, I find it hard to code a solution, .. (here am asking for simplifying what I read in references )
I searched quite a bit for libraries in that direction, with no useful result.
again, I have ground truth camera matrices and need only 3d positions as first estimate (without BA),
A coded software solution can be of great help as I don't need to reinvent the wheel, though a detailed instructions maybe helpful
this basically shows the underlying geometry for estimating the depth.
As you said, we have camera pose Q, and we are picking a point X from world, X_L is it's projection on left image, now, with Q_L, Q_R and X_L, we are able to make up this green colored epipolar plane, the rest job is easy, we search through points on line (Q_L, X), this line exactly describe the depth of X_L, with different assumptions: X1, X2,..., we can get different projections on the right image
Now we compare the pixel intensity difference from X_L and the reprojected point on right image, just pick the smallest one and that corresponding depth is exactly what we want.
Pretty easy hey? Truth is it's way harder, image is never strictly convex:
This makes our matching extremely hard, since the non-convex function will result any distance function have multiple critical points (candidate matches), how do you decide which one is the correct one?
However, people proposed path based match to handle this problem, methods like: SAD, SSD, NCC, they are introduced to create the distance function as convex as possible, still, they are unable to handle large scale repeated texture problem and low texture problem.
To solve this, people start to search over a long range in the epipolar line, and suddenly found that we can describe this whole distribution of matching metrics into a distance along the depth.
The horizontal axis is depth, and the vertical axis is matching metric score, and this illustration lead us found the depth filter, and we usually describe this distribution with gaussian, aka, gaussian depth filter, and use this filter to discribe the uncertainty of depth, combined with the patch matching method, we can roughly get a proposal.
Now what, let's use some optimization tools, like GN or gradient descent to finally refine the depth estimaiton.
To sum up, the total process of the depth estimation is like the following steps:
assume all depth in all pixel following a initial gaussian distribution
start search through epipolar line and reproject points into target frame
triangulate depth and calculate the uncertainty of the depth from depth filter
run 2 and 3 again to get a new depth distribution and merge with previous one, if they converged then break, ortherwise start again from 2.

Why calculate jacobians in ekf-slam

I know it is a very basic question but I want to know why do we calculate the Jacobian matrices in EKF-SLAM, I have tried so hard to understand this, well it won't be that hard but I want to know it. I was wondering if anyone could help me on this.
The Kalman filter operates on linear systems. The steps update two parts in parallel: The state x, and the error covariances P. In a linear system we predict the next x by Fx. It turns out that you can compute the exact covariance of Fx as FPF^T. In a non-linear system, we can update x as f(x), but how do we update P? There are two popular approaches:
In the EKF, we choose a linear approximation of f() at x, and then use the usual method FPF^T.
In the UKF, we build an approximation of the distribution of x with covariance P. The approximation is a set of points called sigma points. Then we propagate those states through our real f(sigma_point) and we measure the resulting distribution's variance.
You are concerned with the EKF (case 1). What is a good linear approximation of a function? If you zoom way in on a curve, it starts to look like a straight line, with a slope that's the derivative of the function at that point. If that sounds strange, look at Taylor series. The multi-variate equivalent is called the Jacobian. So we evaluate the Jacobian of f() at x to give us an F. Now Fx != f(x), but that's okay as long as the changes we're making to x are small (small enough that our approximated F wouldn't change much from before to after).
The main problem with the EKF approximation is that when we use the approximation to update the distributions after the measurement step, it tends to make the resulting covariance P too low. It acts like corrections "work" in a linear way. The actual update will depart slightly from the linear approximation and not be quite as good. These small amounts of overconfidence build up as the KF iterates and have to be offset by adding some fictitious process noise to Q.

Homography and projective transformation

im trying to write a code that will do projective transformation, but with more than 4 key points. i found this helpful guide but it uses 4 points of reference
https://math.stackexchange.com/questions/296794/finding-the-transform-matrix-from-4-projected-points-with-javascript
i know that matlab uses has a function tcp2form that handles that, but i haven't found a way so far.
anyone can give me some guidance, on how to do so? i can solve the equations using (least squares), but i'm stuck since i have a matrix that is larger than 3*3 and i can't multiple the homogeneous coordinates.
Thanks
If you have more than four control points, you have an overdetermined system of equations. There are two possible scenarios. Either your points are all compatible with the same transformation. In that case, any four points can be used, and the rest will match the transformation exactly. At least in theory. For the sake of numeric stability you'd probably want to choose your points so that they are far from being collinear.
Or your points are not all compatible with a single projective transformation. In this case, all you can hope for is an approximation. If you want the best approximation, you'll have to be more specific about what “best” means, i.e. some kind of error measure. Measuring things in a projective setup is inherently tricky, since there are usually a lot of arbitrary decisions involved.
What you can try is fixing one matrix entry (e.g. the lower right one to 1), then writing the conditions for the remaining 8 coordinates as a system of linear equations, and performing a least squares approximation. But the choice of matrix representative (i.e. fixing one entry here) affects the least squares error measure while it has no effect on the geometric meaning, so this is a pretty arbitrary choice. If the lower right entry of the desired matrix should happen to be zero, you'd computation will run into numeric problems due to overflow.

Can RANSAC be improved to remove outliers?

I am using SIFT feature detector and descriptor. I am matching the points between two images. I am using findHomography() function of OpenCV with the RANSAC method.
When I read about the RANSAC algorithm, it is said that adjusting a threshold parameter for RANSAC can improve the results. But I don't want to hardcode any parameter.
I know RANSAC is removing outliers in matches. Can anyone let me know if removing outliers (not all of them) with basic methods before applying homography improves the result of homography?
If so, how can we apply an operation before RANSAC to remove outliers?
What is your definition of a good result? RANSAC is about a tradeoff between the number of points and their precision, so there is no uniform definition of good: you have more inliers if their accuracy is worse and vice versa.
The parameter you are talking about is probably an outlier threshold and it may be just badly tuned so you have too many approximate inliers or too few super accurate inliers. Now, if you pre-filter your outliers you will just speed up your RANSAC but unlikely to improve the solution. Eventually the speed of RANSAC with Homography boils down to the probability of selecting 4 inliers and when their proportion is higher the convergence is more speedy.
The other methods to sort out outliers before applying RANSAC is to look at simpler constraints such as ordering of points, straight lines still being straight lines, cross-ratio and other invariants of Homography transformation. Finally you may want to use higher level features such as lines to calculate homography. Note, that in homogeneous coordinates when points are transformed as p2=H*p1, the lines are transformed as l2 = H-t * l1. This can actually increase the accuracy (since lines are macro features and are less noisy than point of interests) while straight lines can be detected via a Hough transform.
No, the whole point of RANSAC and related algorithms is to remove outliers.
However, it is possible to refine the algorithm in ways that avoid the definition of somewhat arbitrary threshold.
A good starting point is Torr's old MLESAC paper

How to improve the homography accuracy?

I used OpenCV's cv::findHomography API to calculate the homography matrix of two planar images.
The matched key points are extracted by SIFT and matched by BFMatcher. As I know, cv:findHomography use RANSAC iteration to find out the best four corresponding points to get the homography matrix.
So I draw the selected four pairs of points with the calculated contour using homograhy matrix of the edge of the object.
The result are as the links:
https://postimg.cc/image/5igwvfrx9/
As we can see, the selected matched points by RANSAC are correct, but the contour shows that the homography is not accurate.
But these test shows that, both the selected matched points and the homography are correct:
https://postimg.cc/image/dvjnvtm53/
My guess is that if the selected matched points are too close, the small error of the pixel position will lead to the significant error of the homography matrix. If the four points are in the corner of the image, then the shift of the matched points by 4-6 pixels still got good homography matrix.
(According the homogenous coordinate, I think it is reasonable, as the small error in the near plane will be amplified in the far away)
My question is:
1.Is my guess right?
2.Since the four matched points are generated by the RANSAC iteration, the overall error of all the keypoints are minimal. But How to get the stable homography, at least making the contour's mapping is correct? The theory proved that if the four corresponding points in a plane are found, the homography matrix should be calculated, but is there any trick in the engineer work?
I think you're right, and the proximity of the 4 points does not help the accuracy of the result. What you observe is maybe induced by numerical issues: the result may be locally correct for these 4 points but becomes worse when going further.
However, RANSAC will not help you here. The reason is simple: RANSAC is a robust estimation procedure that was designed to find the best point pairs among many correspondences (including some wrong ones). Then, in the inner loop of the RANSAC, a standard homography estimation is performed.
You can see RANSAC as a way to reject wrong point correspondences that would provoke a bad result.
Back to your problem:
What you really need is to have more points. In your examples, you use only 4 point correspondences, which is just enough to estimate an homography.
You will improve your result by providing more matches all over the target image. The problem then becomes over-determined, but a least squares solution can still be found by OpenCV. Furthermore, of there is some error either in the point correspondence process or in some point localization, RANSAC will be able to select the best ones and still give you a reliable result.
If RANSAC results in overfitting on some 4 points (as it seems to be the case in your example), try to relax the constraint by increasing the ransacReprojThreshold parameter.
Alternatively, you can either:
use a different estimator (the robust median CV_LMEDS is a good choice if there are few matching errors)
or use RANSAC in a first step with a large reprojection error (to get a rough estimate) in order to detect the spurious matchings then use LMEDS on the correct ones.
Just to extend #sansuiso's answer, with which I agree:
If you provide around 100 correspondences to RANSAC, probably you are getting more than 4 inliers from cvFindHomography. Check the status output parameter.
To obtain a good homography, you should have many more than 4 correspondences (note that 4 correspondences gives you an homography always), which are well distributed around the image and which are not linear. You can actually use a minimum number of inliers to decide whether the homography obtained is good enough.
Note that RANSAC finds a set of points that are consistent, but the way it has to say that that set is the best one (the reprojection error) is a bit limited. There is a RANSAC-like method, called MSAC, that uses a slightly different error measurement, check it out.
The bad news, in my experience, is that it is little likely to obtain a 100% precision homography most of the times. If you have several similar frames, it is possible that you see that homography changes a little between them.
There are tricks to improve this. For example, after obtaining a homography with RANSAC, you can use it to project your model into the image, and look for new correspondences, so you can find another homography that should be more accurate.
Your target has a lot of symmetric and similar elements. As other people mentioned (and you clarified later) the point spacing and point number can be a problem. Another problem is that SIFT is not designed to deal with significant perspective distortions that are present in your case. Try to track your object through smaller rotations and as was mentioned reproject it using the latest homography to make it look as close as possible to the original. This will also allow you to skip processing heavy SIFT and to use something as lightweight as FAST with cross correlation of image patches for matching.
You also may eventually come to understanding that using points is not enough. You have to use all that you got and this means lines or conics. If a homography transforms a point Pb = H* Pa it is easy to verify that in homogeneous coordinates line Lb = Henv.transposed * La. this directly follows from the equation La’.Pa = 0 = La’ * Hinv * H * Pa = La’ * Hinv * Pb = Lb’.Pb
The possible min. configurations is 1 line and three points or three lines and one point. Two lines and two points doesn’t work. You can use four lines or four points as well. Of course this means that you cannot use the openCV function anymore and has to write your own DLT and then non-linear optimization.

Resources