Homography and Affine Transformation - opencv

Hi i am a beginner in computer vision and i wish to know what exactly is the difference between a homography and affine tranformation, if you want to find the translation between two images which one would you use and why?. From papers and definitions I found online, I am yet to find the difference between them and where one is used instead of the other.
Thanks for your help.

A picture is worth a thousand words:

I have set it down in the terms of a layman.
Homography
A homography, is a matrix that maps a given set of points in one image to the corresponding set of points in another image.
The homography is a 3x3 matrix that maps each point of the first image to the corresponding point of the second image. See below where H is the homography matrix being computed for point x1, y1 and x2, y2
Consider the points of the images present below:
In the case above, there are 4 homography matrices generated.
Where is it used?
You may want to align the above depicted images. You can do so by using the homography.
Here the second image is mapped with respect to the first
Another application is Panoramic Stitching
Visit THIS BLOG for more
Affine transformation
An affine transform generates a matrix to transform the image with respect to the entire image. It does not consider certain points as in the case of homography.
Hence in affine transformation the parallelism of lines is always preserved (as mentioned by EdChum ).
Where is it used?
It is used in areas where you want to alter the entire image:
Rotation (self understood)
Translation (shifting the entire image by a certain length either to top/bottom or left/right)
Scaling (it is basically shrinking or blowing up an image)
See THIS PAGE for more

Related

Why homgraphy matrix doesn't work for any point in the image?

I'm trying to find homography between two images. If I calculate homography based on only corresponding points on the ground then the transformation works well only for the points on the ground.If I calculate homography based on only corresponding points on not on the ground then the transformation works well only for the points on that level.
How do I get homography that will work for any point on the image? Is it even possible ?

What is the difference between the fundamental, essential and homography matrices?

I have two images that are taken from different positions. The 2nd camera is located to the right, up and backward with respect to 1st camera.
So I think there is a perspective transformation between the two views and not just an affine transform since cameras are at relatively different depths. Am I right?
I have a few corresponding points between the two images. I think of using these corresponding points to determine the transformation of each pixel from the 1st to the 2nd image.
I am confused by the functions findFundamentalMat and findHomography. Both return a 3x3 matrix. What is the difference between the two?
Is there any condition required/prerequisite to use them (when to use them)?
Which one to use to transform points from 1st image to 2nd image? In the 3x3 matrices, which the functions return, do they include the rotation and translation between the two image frames?
From Wikipedia, I read that the fundamental matrix is a relation between corresponding image points. In an SO answer here, it is said the essential matrix E is required to get corresponding points. But I do not have the internal camera matrix to calculate E. I just have the two images.
How should I proceed to determine the corresponding point?
Without any extra assumption on the world scene geometry, you cannot affirm that there is a projective transformation between the two views. This is only true if the scene is planar. A good reference on that topic is the book Multiple View Geometry in Computer Vision by Hartley and Zisserman.
If the world scene is not planar, you should definitely not use the findHomography function. You can use the findFundamentalMat function, which will provide you an estimation of the fundamental matrix F. This matrix describes the epipolar geometry between the two views. You may use F to rectify your images in order to apply stereo algorithms to determine a dense correspondence map.
I assume you are using the expression "perspective transformation" to mean "projective transformation". To the best of my knowledge, a perspective transformation is a world to image mapping, not an image to image mapping.
The Fundamental matrix has the relation
x'Fu = 0
with x in one image and u in the other iff x and u are projections of the same 3d point.
Also
l = Fu
defines a line (lx' = 0) where the correponding point of u must be on, so it can be used to confine the searchspace for the correspondences.
A Homography maps a point on one projection of a plane to another projection of the plane.
x = Hu
There are only two cases where the transformation between two views is a projective transformation (ie a homography): either the scene is planar or the two views were generated by a camera rotating around its center.

SIFT matches and recognition?

I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.
For this reasons I would like to try what the author of SIFT said to be pretty good: voting.
I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:
Location [x, y] (someone says Traslation)
Scale
Orientation
While with opencv is easy to get the match scale and orientation with:
cv::Keypoints.octave
cv::Keypoints.angle
I am having hard time to understand how I can calculate the location.
I have found an interesting slide where with only one match we are able to draw a bounding box:
But I don't get how I could draw that bounding box with just one match. Any help?
You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy), scale change ds, and rotation d_theta.
Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1) be the location of f1 in image 1, let s1 be its scale, and let theta1 be it's orientation. Similarly you have (x2,y2), s2, and theta2 for f2.
The translation between two features is (dx,dy) = (x2-x1, y2-y1).
The scale change between two features is ds = s2 / s1.
The rotation between two features is d_theta = theta2 - theta1.
So, dx, dy, ds, and d_theta are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.
Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.
When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either (w/2,h/2) or with the help of central moments).
E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a vector<{a,b}>. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid.
votemap(feature.x - a*, feature.y - b*)+=1 where a,b corresponds to this particular feature vector.
If some of those features cast successfully at the same point (clustering is essential), you have found an object instance.
Signature and voting are reverse procedures. Let's assume V=(-20,-10). So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be V'=(+20*0.5*cos(-10),+10*0.5*sin(-10)) away from the SIFT feature because it is in half size and rotated by -10 degrees.
To complete Dima's , one needs to add that the 4D Hough space is quantized into a (possibly small) number of 4D boxes, where each box corresponds to the simiƩarity given by its center.
Then, for each possible similarity obtained via a tentative matching of features, add 1 into the corresponding box (or cell) in the 4D space. The output similarity is given by the cell with the more votes.
In order to computethe transform from 1 match, just use Dima's formulas in his answer. For several pairs of matches, you may need to use some least squares fit.
Finally, the transform can be applied with the function cv::warpPerspective(), where the third line of the perspective matrix is set to [0,0,1].

Can I reuse Homography matrix calculated from 2 different images of same scene taken by 2 different cameras?

I'm trying to learn OpenCV. I've a question regarding homography and epipolar geometry.
Suppose I've calculated homography using cvFindHomography() function using two static images' matched feature points taken with two cameras from two different view points.
Is it wrong if I reuse homography matrix to detect corresponding points in camera 1(right) from the image taken by camera2(left)(because I know that x' = H.x where x' is left images' 2d homogenous feature point, x is right images' 2d corresponding homogenous feature point and H is homography matrix) where the 2d points in camera1 and camera2 were not used to calculate homography matrix?
What I mean to ask is can I reuse calculated homography matrix of those two cameras to find corresponding points for any images that is not used to calculate homography matrix?
Does it matter which image I use when it was once determined by fixed images? or do i need to calculate it every time?
You can use homography to project points from one image to another as long as cameras don't move anymore and the scene doesn't change.
I understand that those cameras (calibrated) take the pictures and then you work with those two pictures all the time. Allright, if you calculate homography, then you can project all the points you want from both images. You will get some error, of course, but this is due to noise in the images and non-linearities that affect to linear method used by findhomography.
If you keep capturing images with the cameras then you have to compute homography again for every new pair of images, because you don't know a priori how the scene will change.

Project points using intrinsic and transformational matrices

Currently I am working on 3D image visualizing project using C# and emgucv.net. On that project following steps already done with 2 images of same scene(a little different in rotation and translation),
Feature detection(SURF), matching and calculate homography
calculate fundamental matrix
calculate essential matrix using above fundamental and camera intrinsic matrices
finally calculate the Rotational and translational matrices
Also I have obtain 4 possible answers for transformational matrix(3X4 [R|T]) using different combinations of R and T by changing its sign. Now I want to select the correct transformation matrix from those 4 answers. Before that I want to check either one of the answer is correct. So I have to re-project the points of second image using "Camera intrinsic matrix" and each one of "Transformation matrix". After that I can compare with resultant points with the second image points to confirm the result(translational matrix).
My question is, How to re-combine translational matrix(rotational[3X3] and translational[3X1] matrix ) and camera intrinsic matrix to project points into image points using emgucv.net?
OR any alternative method to confirm the transformational matrix that I obtain?
Thanks in advance for any help.

Resources