I'm working on OpenCV in visual studio 2012 , i tried to get Homography matrix to Stitch 2 images in opencv so really i don't know the steps to follow to do'it :
First you should be aware that two point correspondences, which you illustrate in your image, are not sufficient for homography estimation. A homography has 7 free parameters - there are 8 matrix elements, but a homography is a homogeneous quantity. This means you can scale the matrix as you like, which reduces the free parameters by one. To estimate 7 parameters you need at least four corresponding points.
I think understanding what you do is more important than blindly calling OpenCv methods, so please read about the real algorithms. The simplest (but non-optimal) method for homography computation is DLT, which amounts te rearranging the equation y = Hx for four or more points in order to solve for the components of H in a least squares sense. This is a nice explanation of the details:
https://engineering.purdue.edu/kak/courses-i-teach/ECE661.08/solution/hw4_s1.pdf
The principal academic reference is Multiple View Geometry for Computer Vision by Heartley and Zisserman.
As pointed out in the comments, the OpenCv docs are here:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#findhomography
To establish an homography between two images you need at least 4 points.
Each one has two coordinates, giving you two equations / constraints.
An homography has 8 degrees of freedom (it's a 3x3 full-rank matrix with 9 elements, -1 d.o.f. because of scale since it operates on homogeneous coordinates).
Note that only images of planar surfaces, or images taken by a camera rotating around its center can be related by an homography (in this latter case, the planar surface is the plane at infinity).
This page shows how to compute an homography with feature extraction and RANSAC.
Related
I am trying to understand the concept of homography. It gives features but I can't get that how does it calculate features from images?
A homography is nothing but a mapping between points on one surface or plane to points on another. In the case of computer vision, it is a matrix specifying the transformation between two views of images, for example.
A homography can be estimated by identifying the keypoints in both the images and then estimating the transformation between the two views. There are many keypoint descriptors available that help in identifying these keypoints.
In camera calibration, an extrinsic matrix is computed by capturing different views of an object of known geometry, like a Chessboard from which the corner points are detected. The matrix is estimated by mathematically solving for the detected points from the many different views captured.
A details derivation of the estimation and solving to obtain the homography matrix can be found in this book. :)
I have two images obtained by a calibrated camera from two different poses. I also have correspondences of 2D points between the images. Some of the points have depth information, so I also know their 3D coordinates. I want to calculate the relative pose between the images.
I know I can compute a fundamental matrix or an essential matrix from the 2D points. I also know PnP can find the pose with 2D-to-3D correspondences and that it's also doable getting just correspondences of 3D points. However, I don't know any algorithm that takes advantage of all the available information. Is there any?
There is only one such algorithm: Bundle Adjustment - everything else is a hack. Get your initial estimates separately, use any "reasonable & simple" hacky way of merging them to get an initial estimate, then byte the bullet and bundle. If you are coding in C++, Google's Ceres is my recommended B.A. library.
I am using standard OpenCV functions to calibrate camera for intrinsic parameters. In order to obtain good results, I know we have to use images of the chessboard from different angles (considering different planes in the 3D). This is stated in all the documentations
and papers but I really don't understand, why is it so important for us to consider different planes and if there is an optimal number of planes that we have to consider for the best calibration results?
I will be glad if you can provide me reference to some paper or documentation which explains this. (I think Zhang's paper talks about it but, its mathematically intensive and was hart to digest.)
Thanks
Mathematically, a unique solution for the intrinsic parameters (up to scale) is defined only if you have 3 or more distinct images of the planar target. See page 6 of Zhang's paper: "If n images of the model plane are observed, by stacking n such equations as (8) we have Vb = 0 ; (9) where V is a 2n×6 matrix. If n ≥ 3, we will have in general a unique solution b defined up to a scale factor..."
There isn't an "optimal" number of planes, where data are concerned, the more you have the merrier you are. But as the solution starts to converge, the marginal gain in calibration accuracy due to adding an extra image becomes negligible. Of course, this assumes that the images show planes well separated in both pose and location.
See also this other answer of mine for practical tips.
If you're looking for a little intuition, here's an example of why one plane isn't enough. Imagine your calibration chessboard is tilting away from you at a 45° angle:
You can see that when you move up the chessboard by 1 meter in the +y direction, you also move away from the camera by 1 meter in the +z direction. This means there's no way to separate the effect of moving in the y direction vs the z direction. The y and z movement directions are effectively tied to each other, for all our training points. So, if we just look at points on this one plane, there's no way to tease apart the effects of y movement vs z movement.
For example, from this 1 plane, we can't tell the difference between these scenarios:
The camera has perspective distortion such that things appear smaller in the image as they move in the world's +y direction.
The camera focal length is such that things appear smaller in the image as they move in the world's +z direction.
Any mixture of the effects in #1 and #2.
Mathematically, this ambiguity means that there are many equally possible solutions when OpenCV tries to fit a camera matrix to match the data. (Note that the 45° angle was not important. Any plane you choose will have the same problem: training examples' (x,y,z) dimensions are coupled together, so you can't separate their effects.)
One last note: if you make enough assumptions about the camera matrix (e.g. no perspective distortion, x and y scale identically, etc) then you can end up with a situation with fewer unknowns (in an extreme case, maybe you're just calculating the focal length) and in that case you could calibrate with just 1 plane.
I'm currently working on a project that deals with the reconstruction based on a set of images, in a multi-view stereo approach. As such I need to know the several images pose in space. I find matching features using surf, and from the correspondences I find the essential matrix.
Now comes the problem: It is possible to decompose the essential matrix with SVD, but this can lead to 4 different results, as I read in a book. How can I obtain the correct one, assuming this is possible?
What other algorithms can I use for this?
Wikipedia says:
It turns out, however, that only one of the four classes of solutions
can be realized in practice. Given a pair of corresponding image
coordinates, three of the solutions will always produce a 3D point
which lies behind at least one of the two cameras and therefore cannot
be seen. Only one of the four classes will consistently produce 3D
points which are in front of both cameras. This must then be the
correct solution.
If you have the extrinsic calibration parameters for the camera in the first frame, or if you assume that it lies at a default calibration, say translation of (0,0,0) and rotation of (0,0,0), then you can determine which of the decompositions is the valid one.
Thanks to Zaphod answer I was able to solve my problem. Here's what I did:
First I calculated the Essential Matrix (E) from a set of point correspondences in both images.
Using SVD, decomposed it into 2 solutions. Using the negated Essential Matrix -E (which also satisfies the same constraints) I arrived at 2 more solutions for a total of 4 possible camera positions and orientations.
Then, for all solutions I triangulated the point correspondences and determined which intersected in front of both cameras, by taking the dot product of the point coordinate and each of the cameras viewing direction. I both are positive, then that intersection is in front of both cameras.
In the end the solution that delivers the most intersections in front of the cameras is the chosen one.
I try to match two overlapping images captured with a camera. To do this, I'd like to use OpenCV. I already extracted the features with the SurfFeatureDetector. Now I try to to compute the rotation and translation vector between the two images.
As far as I know, I should use cvFindExtrinsicCameraParams2(). Unfortunately, this method require objectPoints as an argument. These objectPoints are the world coordinates of the extracted features. These are not known in the current context.
Can anybody give me a hint how to solve this problem?
The problem of simultaneously computing relative pose between two images and the unknown 3d world coordinates has been treated here:
Berthold K. P. Horn. Relative orientation revisited. Berthold K. P. Horn. Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 545 Technology ...
EDIT: here is a link to the paper:
http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.4700
Please see my answer to a related question where I propose a solution to this problem:
OpenCV extrinsic camera from feature points
EDIT: You may want to take a look at bundle adjustments too,
http://en.wikipedia.org/wiki/Bundle_adjustment
That assumes an initial estimate is available.
EDIT: I found some code resources you might want to take a look at:
Resource I:
http://www.maths.lth.se/vision/downloads/
Two View Geometry Estimation with Outliers
C++ code for finding the relative orientation of two calibrated
cameras in presence of outliers. The obtained solution is optimal in
the sense that the number of inliers is maximized.
Resource II:
http://lear.inrialpes.fr/people/triggs/src/ Relative orientation from
5 points: a somewhat more polished C routine implementing the minimal
solution for relative orientation of two calibrated cameras from
unknown 3D points. 5 points are required and there can be as many as
10 feasible solutions (but 2-5 is more common). Also requires a few
CLAPACK routines for linear algebra. There's also a short technical
report on this (included with the source).
Resource III:
http://www9.in.tum.de/praktika/ppbv.WS02/doc/html/reference/cpp/toc_tools_stereo.html
vector_to_rel_pose Compute the relative orientation between two
cameras given image point correspondences and known camera parameters
and reconstruct 3D space points.
There is a theoretical solution, however, the OpenCV implementation of camera pose estimation lacks the needed tools.
The theoretical approach:
Step 1: extract the homography (the matrix describing the geometrical transform between images). use findHomography()
Step 2. Decompose the result matrix into rotations and translations. Use cv::solvePnP();
Problem: findHomography() returns a 3x3 matrix, corresponding to a projection from a plane to another. solvePnP() needs a 3x4 matrix, representing the 3D rotation/translation of the objects. I think that with some approximations, you can modify the solvePnP to give you some results, but it requires a lot of math and a very good understanding of 3D geometry.
Read more about at http://en.wikipedia.org/wiki/Transformation_matrix