Homography and projective transformation - image-processing

im trying to write a code that will do projective transformation, but with more than 4 key points. i found this helpful guide but it uses 4 points of reference
https://math.stackexchange.com/questions/296794/finding-the-transform-matrix-from-4-projected-points-with-javascript
i know that matlab uses has a function tcp2form that handles that, but i haven't found a way so far.
anyone can give me some guidance, on how to do so? i can solve the equations using (least squares), but i'm stuck since i have a matrix that is larger than 3*3 and i can't multiple the homogeneous coordinates.
Thanks

If you have more than four control points, you have an overdetermined system of equations. There are two possible scenarios. Either your points are all compatible with the same transformation. In that case, any four points can be used, and the rest will match the transformation exactly. At least in theory. For the sake of numeric stability you'd probably want to choose your points so that they are far from being collinear.
Or your points are not all compatible with a single projective transformation. In this case, all you can hope for is an approximation. If you want the best approximation, you'll have to be more specific about what “best” means, i.e. some kind of error measure. Measuring things in a projective setup is inherently tricky, since there are usually a lot of arbitrary decisions involved.
What you can try is fixing one matrix entry (e.g. the lower right one to 1), then writing the conditions for the remaining 8 coordinates as a system of linear equations, and performing a least squares approximation. But the choice of matrix representative (i.e. fixing one entry here) affects the least squares error measure while it has no effect on the geometric meaning, so this is a pretty arbitrary choice. If the lower right entry of the desired matrix should happen to be zero, you'd computation will run into numeric problems due to overflow.

Related

How to 3d reconstruct robustly from multiple images with known poses in OpenCV

The traditional solution for high resolution images examples :
extract features (dense) for all images
match features to find tracks through images
triangulate features to 3d points.
I can give two problem here for my case (many 640*480 images with small movements between each others) , first: matching is very slow , especially if the number of images is big, so a better solution can be optical flow tracking.., but it's getting sparse with big moves, ( a mix could solve the problem !!)
second: triangulate tracks , though it is over-determined problem, I find it hard to code a solution, .. (here am asking for simplifying what I read in references )
I searched quite a bit for libraries in that direction, with no useful result.
again, I have ground truth camera matrices and need only 3d positions as first estimate (without BA),
A coded software solution can be of great help as I don't need to reinvent the wheel, though a detailed instructions maybe helpful
this basically shows the underlying geometry for estimating the depth.
As you said, we have camera pose Q, and we are picking a point X from world, X_L is it's projection on left image, now, with Q_L, Q_R and X_L, we are able to make up this green colored epipolar plane, the rest job is easy, we search through points on line (Q_L, X), this line exactly describe the depth of X_L, with different assumptions: X1, X2,..., we can get different projections on the right image
Now we compare the pixel intensity difference from X_L and the reprojected point on right image, just pick the smallest one and that corresponding depth is exactly what we want.
Pretty easy hey? Truth is it's way harder, image is never strictly convex:
This makes our matching extremely hard, since the non-convex function will result any distance function have multiple critical points (candidate matches), how do you decide which one is the correct one?
However, people proposed path based match to handle this problem, methods like: SAD, SSD, NCC, they are introduced to create the distance function as convex as possible, still, they are unable to handle large scale repeated texture problem and low texture problem.
To solve this, people start to search over a long range in the epipolar line, and suddenly found that we can describe this whole distribution of matching metrics into a distance along the depth.
The horizontal axis is depth, and the vertical axis is matching metric score, and this illustration lead us found the depth filter, and we usually describe this distribution with gaussian, aka, gaussian depth filter, and use this filter to discribe the uncertainty of depth, combined with the patch matching method, we can roughly get a proposal.
Now what, let's use some optimization tools, like GN or gradient descent to finally refine the depth estimaiton.
To sum up, the total process of the depth estimation is like the following steps:
assume all depth in all pixel following a initial gaussian distribution
start search through epipolar line and reproject points into target frame
triangulate depth and calculate the uncertainty of the depth from depth filter
run 2 and 3 again to get a new depth distribution and merge with previous one, if they converged then break, ortherwise start again from 2.

How to improve the homography accuracy?

I used OpenCV's cv::findHomography API to calculate the homography matrix of two planar images.
The matched key points are extracted by SIFT and matched by BFMatcher. As I know, cv:findHomography use RANSAC iteration to find out the best four corresponding points to get the homography matrix.
So I draw the selected four pairs of points with the calculated contour using homograhy matrix of the edge of the object.
The result are as the links:
https://postimg.cc/image/5igwvfrx9/
As we can see, the selected matched points by RANSAC are correct, but the contour shows that the homography is not accurate.
But these test shows that, both the selected matched points and the homography are correct:
https://postimg.cc/image/dvjnvtm53/
My guess is that if the selected matched points are too close, the small error of the pixel position will lead to the significant error of the homography matrix. If the four points are in the corner of the image, then the shift of the matched points by 4-6 pixels still got good homography matrix.
(According the homogenous coordinate, I think it is reasonable, as the small error in the near plane will be amplified in the far away)
My question is:
1.Is my guess right?
2.Since the four matched points are generated by the RANSAC iteration, the overall error of all the keypoints are minimal. But How to get the stable homography, at least making the contour's mapping is correct? The theory proved that if the four corresponding points in a plane are found, the homography matrix should be calculated, but is there any trick in the engineer work?
I think you're right, and the proximity of the 4 points does not help the accuracy of the result. What you observe is maybe induced by numerical issues: the result may be locally correct for these 4 points but becomes worse when going further.
However, RANSAC will not help you here. The reason is simple: RANSAC is a robust estimation procedure that was designed to find the best point pairs among many correspondences (including some wrong ones). Then, in the inner loop of the RANSAC, a standard homography estimation is performed.
You can see RANSAC as a way to reject wrong point correspondences that would provoke a bad result.
Back to your problem:
What you really need is to have more points. In your examples, you use only 4 point correspondences, which is just enough to estimate an homography.
You will improve your result by providing more matches all over the target image. The problem then becomes over-determined, but a least squares solution can still be found by OpenCV. Furthermore, of there is some error either in the point correspondence process or in some point localization, RANSAC will be able to select the best ones and still give you a reliable result.
If RANSAC results in overfitting on some 4 points (as it seems to be the case in your example), try to relax the constraint by increasing the ransacReprojThreshold parameter.
Alternatively, you can either:
use a different estimator (the robust median CV_LMEDS is a good choice if there are few matching errors)
or use RANSAC in a first step with a large reprojection error (to get a rough estimate) in order to detect the spurious matchings then use LMEDS on the correct ones.
Just to extend #sansuiso's answer, with which I agree:
If you provide around 100 correspondences to RANSAC, probably you are getting more than 4 inliers from cvFindHomography. Check the status output parameter.
To obtain a good homography, you should have many more than 4 correspondences (note that 4 correspondences gives you an homography always), which are well distributed around the image and which are not linear. You can actually use a minimum number of inliers to decide whether the homography obtained is good enough.
Note that RANSAC finds a set of points that are consistent, but the way it has to say that that set is the best one (the reprojection error) is a bit limited. There is a RANSAC-like method, called MSAC, that uses a slightly different error measurement, check it out.
The bad news, in my experience, is that it is little likely to obtain a 100% precision homography most of the times. If you have several similar frames, it is possible that you see that homography changes a little between them.
There are tricks to improve this. For example, after obtaining a homography with RANSAC, you can use it to project your model into the image, and look for new correspondences, so you can find another homography that should be more accurate.
Your target has a lot of symmetric and similar elements. As other people mentioned (and you clarified later) the point spacing and point number can be a problem. Another problem is that SIFT is not designed to deal with significant perspective distortions that are present in your case. Try to track your object through smaller rotations and as was mentioned reproject it using the latest homography to make it look as close as possible to the original. This will also allow you to skip processing heavy SIFT and to use something as lightweight as FAST with cross correlation of image patches for matching.
You also may eventually come to understanding that using points is not enough. You have to use all that you got and this means lines or conics. If a homography transforms a point Pb = H* Pa it is easy to verify that in homogeneous coordinates line Lb = Henv.transposed * La. this directly follows from the equation La’.Pa = 0 = La’ * Hinv * H * Pa = La’ * Hinv * Pb = Lb’.Pb
The possible min. configurations is 1 line and three points or three lines and one point. Two lines and two points doesn’t work. You can use four lines or four points as well. Of course this means that you cannot use the openCV function anymore and has to write your own DLT and then non-linear optimization.

obtaining 2d-3d point correspondences for pnp or posit

I am trying to estimate the pose and position of a satellite given an image of it. I have a 3D model of the satellite. Using either PnP solvers or POSIT works great when I pick out the point correspondences myself, however I need to to find a method to match the points up automatically. Using a corner detector (best one I found so far is based on the contour) I can find all the relevant points in the image in addition a few spurious points. However I need to match a given point in the image to the correct point in the 3D model. The articles I have read on the subject always seem to assume that we have found the point pairs without going into details about how to do so.
Is there any approach usually taken that can determine these correspondences based on some invariant features? Or should i resort to a different method not based on corner points?
You can have a look at the SoftPOSIT algorithm, which determines 3D-2D correspondences and then executes POSIT algorithm. As far as I know Matlab code is available for SoftPOSIT.
ou have to do PnP with RANSAC, see openCV code solvePnPRansac(). This method can tolerate a high percent of mismatches so you don't need to be precise with all your matches but just have a certain percent of correct ones (even as low as 30%). Of course the min number of right correspondences is 4.
Speaking of invariant features - if the amount of rotation between neighbouring frame is small you don't need to use invariant features. Even a small patch of with grey intensities would suffice to find a match. The only problem is that you have to update your descriptor or even choose a different feature point on your model depending on the model rotation. The latter may be hard to do since you have to know 3D coordinate of every feature.

motion reconstruction from a single camera

I have a single calibrated camera (known intrinsic parameters, i.e. camera matrix K is known, as well as the distortion coefficients).
I would like to reconstruct the camera's 3d trajectory. There is no a-priori knowledge about the scene.
simplifying the problem by presenting two images that look on the same scene and extracting two set of corresponding matched feature points from them (SIFT, SURF, ORB, etc.)
My problem is how can I calculate the camera extrinsic parameters (i.e. the rotation matrix R and the translation vector t ) between the to viewpoints?
I have managed to calculate the fundamental matrix, and since K is know, the essential matrix as well. using David Nister's efficient solution to the Five-Point Relative Pose Problem I've managed to get 4 possible solution but:
the constraint on the essential matrix E ~ U * diag (s,s,0) * V' doesn't always apply - causing incorrect results.
[EDIT]: taking the average singular value seems to correct the results :) one down
how can I tell which one of the four is the correct one?
Thanks
Your solution to point 1 is correct: diag( (s1 + s2)/2, (s1 + s2)/2, 0).
As for telling which one of the four solutions is correct, only one will give positive depths for all points with respect to the camera frame. That's the one you want.
Code for checking which solution is correct can be found here: http://cs.gmu.edu/%7Ekosecka/examples-code/essentialDiscrete.m from http://cs.gmu.edu/%7Ekosecka/bookcode.html
They use the determinants of U and V to determine the solution with the correct orientation. Look for the comment "then four possibilities are". Since you're only estimating the essential matrix, it's susceptible to noise and does not behave well at all if all of the points are coplanar.
Also, the translation is only recovered to within a constant scaling factor, so the fact that you're seeing a normalized translation vector of unit magnitude is exactly correct. The reason is that the depth is unknown and estimated to be 1. You'll have to find some way to recover the depth as in the code for the eight-point algorithm + 3d reconstruction (Algorithm 5.1 in the bookcode link.)
The book the sample code above is taken from is also a very good reference. http://vision.ucla.edu/MASKS/ Chapter 5, the one you're interested in, is available on the Sample Chapters link.
Congrats on your hard work, sounds like you've tried hard to learn these techniques.
For actual production-strength code, I'd advise to download libmv and ceres, and re-code your solution using them.
Your two questions are really one: invalid solutions are rejected based on the data you have collected. In particular, Nister's (as well as Stewenius's) algorithm is normally used in the inner loop of a RANSAC-like solver, which selects for the solution with the best fit / max number of inliers.

How is a homography calculated?

I am having quite a bit of trouble understanding the workings of plane to plane homography. In particular I would like to know how the opencv method works.
Is it like ray tracing? How does a homogeneous coordinate differ from a scale*vector?
Everything I read talks like you already know what they're talking about, so it's hard to grasp!
Googling homography estimation returns this as the first link (at least to me):
http://cseweb.ucsd.edu/classes/wi07/cse252a/homography_estimation/homography_estimation.pdf. And definitely this is a poor description and a lot has been omitted. If you want to learn these concepts reading a good book like Multiple View Geometry in Computer Vision would be far better than reading some short articles. Often these short articles have several serious mistakes, so be careful.
In short, a cost function is defined and the parameters (the elements of the homography matrix) that minimize this cost function are the answer we are looking for. A meaningful cost function is geometric, that is, it has a geometric interpretation. For the homography case, we want to find H such that by transforming points from one image to the other the distance between all the points and their correspondences be minimum. This geometric function is nonlinear, that means: 1-an iterative method should be used to solve it, in general, 2-an initial starting point is required for the iterative method. Here, algebraic cost functions enter. These cost functions have no meaningful/geometric interpretation. Often designing them is more of an art, and for a problem usually you can find several algebraic cost functions with different properties. The benefit of algebraic costs is that they lead to linear optimization problems, hence a closed form solution for them exists (that is a one shot /non-iterative method). But the downside is that the found solution is not optimal. Therefore, the general approach is to first optimize an algebraic cost and then use the found solution as starting point for an iterative geometric optimization. Now if you google for these cost functions for homography you will find how usually these are defined.
In case you want to know what method is used in OpenCV simply need to have a look at the code:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L81
This is the algebraic function, DLT, defined in the mentioned book, if you google homography DLT should find some relevant documents. And then here:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L165
An iterative procedure minimizes the geometric cost function.It seems the Gauss-Newton method is implemented:
http://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
All the above discussion assumes you have correspondences between two images. If some points are matched to incorrect points in the other image, then you have got outliers, and the results of the mentioned methods would be completely off. Robust (against outliers) methods enter here. OpenCV gives you two options: 1.RANSAC 2.LMeDS. Google is your friend here.
Hope that helps.
To answer your question we need to address 4 different questions:
1. Define homography.
2. See what happens when noise or outliers are present.
3. Find an approximate solution.
4. Refine it.
Homography in a 3x3 matrix that maps 2D points. The mapping is linear in homogeneous coordinates: [x2, y2, 1]’ ~ H * [x1, y1, 1]’, where ‘ means transpose (to write column vectors as rows) and ~ means that the mapping is up to scale. It is easier to see in Cartesian coordinates (multiplying nominator and denominator by the same factor doesn’t change the result)
x2 = (h11*x1 + h12*y1 + h13)/(h31*x1 + h32*y1 + h33)
y2 = (h21*x1 + h22*y1 + h23)/(h31*x1 + h32*y1 + h33)
You can see that in Cartesian coordinates the mapping is non-linear, but for now just keep this in mind.
We can easily solve a former set of linear equations in Homogeneous coordinates using least squares linear algebra methods (see DLT - Direct Linear Transform) but this unfortunately only minimizes an algebraic error in homography parameters. People care more about another kind of error - namely the error that shifts points around in Cartesian coordinate systems. If there is no noise and no outliers two erros can be identical. However the presence of noise requires us to minimize the residuals in Cartesian coordinates (residuals are just squared differences between the left and right sides of Cartesian equations). On top of that, a presence of outliers requires us to use a Robust method such as RANSAC. It selects the best set of inliers and rejects a few outliers to make sure they don’t contaminate our solution.
Since RANSAC finds correct inliers by random trial and error method over many iterations we need a really fast way to compute homography and this would be a linear approximation that minimizes parameters' error (wrong metrics) but otherwise is close enough to the final solution (that minimizes squared point coordinate residuals - a right metrics). We use a linear solution as a guess for further non-linear optimization;
The final step is to use our initial guess (solution of linear system that minimized Homography parameters) in solving non-linear equations (that minimize a sum of squared pixel errors). The reason to use squared residuals instead of their absolute values, for example, is because in Gaussian formula (describes noise) we have a squared exponent exp(x-mu)^2, so (skipping some probability formulas) maximum likelihood solutions requires squared residuals.
In order to perform a non-linear optimization one typically employs a Levenberg-Marquardt method. But in the first approximation one can just use a gradient descent (note that gradient points uphill but we are looking for a minimum thus we go against it, hence a minus sign below). In a nutshell, we go through a set of iterations 1..t..N selecting homography parameters at iteration t as param(t) = param(t-1) - k * gradient, where gradient = d_cost/d_param.
Bonus material: to further minimize the noise in your homography you can try a few tricks: reduce a search space for points (start tracking your points); use different features (lines, conics, etc. that are also transformed by homography but possibly have a higher SNR); reject impossible homographs to speed up RANSAC (e.g. those that correspond to ‘impossible’ point movements); use low pass filter for small changes in Homographies that may be attributed to noise.

Resources