calculating the destination points for OpenCV's findHomography - image-processing

EDIT: I've now found this similar question with a very detailed answer:
proportions of a perspective-deformed rectangle
I'm using OpenCV's findHomography() and warpPerspective() methods to "de skew" a photograph of a sheet of paper. I have this largely working but I'm stuck on a detail.
The part I don't understand how to do is to calculate the optimum set of destination points to input to findHomography(). I know that I want my output to be rectangular, but I dont know the ratio of the width to height of the rectangle. I also want the output rectangle to be sized such that there is minimal scaling of the output image when I apply the transform via warpPerspective(). All I have are the four points that form the quadrilateral I want to transform in the source image. How do I calculate an optimum-sized destination rectangle?

The findHomography() method will need four points (if using Direct Linear Transform). If you want the optimal set you will need the 4-point set which DLT's homography gives the minimum reprojection error. I mean, you need a method that detects inliers/outliers for the particular mathematical model od the DLT.
THis method is RANSAC, and OpenCV has it implemented. You will find examples of findhomography() combined with RANSAC.
I personally find one problem with this and it is the number of iterations of RANSAC in OpenCV, which is too high. If you are looking for optimal speed you will have to dig into the codes.

Related

Compute similarity of 2 binary images

I have 2 binary images (black/white). They have the same content but might slightly differ in rotation/translation and scale of the content (text).
How do I get a simple measure for the similarity of 2 images in OpenCV?
The operation needs to be as fast as possible (live).
Examples:
A:
B:
You can use LogPolarFFT registration algorithm to register the images, then compare them using similarity check (PSNR or SSIM).
You need to remove the scale and rotation.
To remove the rotation, take pca which gives you the primary axis, and rotate both images so the primary axis is along the x. (Use a shear rotate). To remove the scale, you then either simply take a bounding box or, if there's a bit of noise in there, take area and scale one until it is equal. Just sample pixel centres to scale ( a bit icky, but it hard to scale a binary image nicely).
I should put some support in the binary image library for this. You might find the material helpful
http://malcolmmclean.github.io/binaryimagelibrary/
The one way I can think of is to find keypoint using SIFT/SURF and then calculate Homography between two images and warp them according to the calculated Homography ( so as to fix rotation and Translation). Then you can simply calculate similarity in terms of SAD.
Need to use rotation and scale invariant approach. Also need a threshold based area segmentation before feature extraction is needed. I suggest to follow below steps:
1/ Binary threshold & scan line algorithm can be used to segment specific text line area.
2/ After segmentation you should adjust the rotation using warpAffine transformation. See this example
3/ On adjusted image you can apply SIFT or BRISK or SURF features to get features
4/ Use template matching approach to match or generate similarity or distance score.
see following link for more detail:
scale and rotation Template matching

Expand homography matrix for distortion

I have two set of corresponding matches that I want to compute Homography Matrix between them. However, I found that the transformation between this points can not be modeled using just the Homography Matrix. I figured this by observing some lines in the original set of points have not represented as lines in the second set.
For example:
The previous state is very extreme in real the distortion is much less than that. It is usually a distortion because of the first set of points were extracted from image that was taken by scanner where the other set of points were extracted from a photo taken by mobile phone.
The Question:
How can I expand or Generalize the Homography matrix to make it includes this case? Or in other words, I want a non-line-preserve transformation model to use it instead of the Homography Matrix, Any Suggestion?
P.S OpenCV library is prefered if there is something ready to use.
EDIT:
Eliminating the distortion may not be an option for me because the photos are somewhat complex and I do not have the same Camera always plus I supposed to deal with images from unknown source (back-end separated from front-end). However, I have a reference which is planner and a query which has perspective + distoration effect which I want to correct it after I could found the corresponding pair matches.
It would be better if you had provided some examples of your images, so that we can understand your case better. From the description it seems that you are dealing with camera distortion.
Typical approach is to perform camera calibration once, then undistort each frame and finally work with images where straight lines look straight. All of these tasks are possible with OpenCV, consider the link above.
In case you cannot perform camera calibration to estimate distortion - there isn't much you can do. Try to calculate and apply homography on unrectified images - if the cameras don't have wide angle lens this should look ok (consider this case for example)

What are keypoints in image processing?

When using OpenCV for example, algorithms like SIFT or SURF are often used to detect keypoints. My question is what actually are these keypoints?
I understand that they are some kind of "points of interest" in an image. I also know that they are scale invariant and are circular.
Also, I found out that they have orientation but I couldn't understand what this actually is. Is it an angle but between the radius and something? Can you give some explanation? I think I need what I need first is something simpler and after that it will be easier to understand the papers.
Let's tackle each point one by one:
My question is what actually are these keypoints?
Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. Interest point detection is actually a subset of blob detection, which aims to find interesting regions or spatial areas in an image. The reason why keypoints are special is because no matter how the image changes... whether the image rotates, shrinks/expands, is translated (all of these would be an affine transformation by the way...) or is subject to distortion (i.e. a projective transformation or homography), you should be able to find the same keypoints in this modified image when comparing with the original image. Here's an example from a post I wrote a while ago:
Source: module' object has no attribute 'drawMatches' opencv python
The image on the right is a rotated version of the left image. I've also only displayed the top 10 matches between the two images. If you take a look at the top 10 matches, these are points that we probably would want to focus on that would allow us to remember what the image was about. We would want to focus on the face of the cameraman as well as the camera, the tripod and some of the interesting textures on the buildings in the background. You see that these same points were found between the two images and these were successfully matched.
Therefore, what you should take away from this is that these are points in the image that are interesting and that they should be found no matter how the image is distorted.
I understand that they are some kind of "points of interest" of an image. I also know that they are scale invariant and I know they are circular.
You are correct. Scale invariant means that no matter how you scale the image, you should still be able to find those points.
Now we are going to venture into the descriptor part. What makes keypoints different between frameworks is the way you describe these keypoints. These are what are known as descriptors. Each keypoint that you detect has an associated descriptor that accompanies it. Some frameworks only do a keypoint detection, while other frameworks are simply a description framework and they don't detect the points. There are also some that do both - they detect and describe the keypoints. SIFT and SURF are examples of frameworks that both detect and describe the keypoints.
Descriptors are primarily concerned with both the scale and the orientation of the keypoint. The keypoints we've nailed that concept down, but we need the descriptor part if it is our purpose to try and match between keypoints in different images. Now, what you mean by "circular"... that correlates with the scale that the point was detected at. Take for example this image that is taken from the VLFeat Toolbox tutorial:
You see that any points that are yellow are interest points, but some of these points have a different circle radius. These deal with scale. How interest points work in a general sense is that we decompose the image into multiple scales. We check for interest points at each scale, and we combine all of these interest points together to create the final output. The larger the "circle", the larger the scale was that the point was detected at. Also, there is a line that radiates from the centre of the circle to the edge. This is the orientation of the keypoint, which we will cover next.
Also I found out that they have orientation but I couldn't understand what actually it is. It is an angle but between the radius and something?
Basically if you want to detect keypoints regardless of scale and orientation, when they talk about orientation of keypoints, what they really mean is that they search a pixel neighbourhood that surrounds the keypoint and figure out how this pixel neighbourhood is oriented or what direction this patch is oriented in. It depends on what descriptor framework you look at, but the general jist is to detect the most dominant orientation of the gradient angles in the patch. This is important for matching so that you can match keypoints together. Take a look at the first figure I have with the two cameramen - one rotated while the other isn't. If you take a look at some of those points, how do we figure out how one point matches with another? We can easily identify that the top of the cameraman as an interest point matches with the rotated version because we take a look at points that surround the keypoint and see what orientation all of these points are in... and from there, that's how the orientation is computed.
Usually when we want to detect keypoints, we just take a look at the locations. However, if you want to match keypoints between images, then you definitely need the scale and the orientation to facilitate this.
I'm not as familiar with SURF, but I can tell you about SIFT, which SURF is based on. I provided a few notes about SURF at the end, but I don't know all the details.
SIFT aims to find highly-distinctive locations (or keypoints) in an image. The locations are not merely 2D locations on the image, but locations in the image's scale space, meaning they have three coordinates: x, y, and scale. The process for finding SIFT keypoints is:
blur and resample the image with different blur widths and sampling rates to create a scale-space
use the difference of gaussians method to detect blobs at different scales; the blob centers become our keypoints at a given x, y, and scale
assign every keypoint an orientation by calculating a histogram of gradient orientations for every pixel in its neighborhood and picking the orientation bin with the highest number of counts
assign every keypoint a 128-dimensional feature vector based on the gradient orientations of pixels in 16 local neighborhoods
Step 2 gives us scale invariance, step 3 gives us rotation invariance, and step 4 gives us a "fingerprint" of sorts that can be used to identify the keypoint. Together they can be used to match occurrences of the same feature at any orientation and scale in multiple images.
SURF aims to accomplish the same goals as SIFT but uses some clever tricks in order to increase speed.
For blob detection, it uses the determinant of Hessian method. The dominant orientation is found by examining the horizontal and vertical responses to Haar wavelets. The feature descriptor is similar to SIFT, looking at orientations of pixels in 16 local neighborhoods, but results in a 64-dimensional vector.
SURF features can be calculated up to 3 times faster than SIFT features, yet are just as robust in most situations.
For reference:
A good SIFT tutorial
An introduction to SURF

Whether the SIFT is rotation invariant feature or not opencv

I want to write a code in opencv that proves whether the SIFT is rotation invariant feature or not.
Assuming that the image has one keypoint which is the center of the image. I want to caculate keypoint descriptor (magnitude and direction). I want to ask what is the keypoint ? is it a location in the image ?
I searched for simple tutorial or code to know what to do but I didn't find something simple.
A keypoint is an interesting point in your image. These points are usually found when you have a change in intensity, for example, at the edges between two objects in the image. A keypoint encodes, among other things, the location of the point in the image. SIFT will then extract a local feature descriptor for your keypoint which you can then use for image matching.
Scale Invariant Feature Transform (SIFT) is scale invariant, as the acronym says. It is not rotationally invariant. In such a case, you could use SURF. But, SURF is a little problematic for real-time applications.
SIFT: http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
SURF: http://www.vision.ee.ethz.ch/~surf/papers.html
Example code: Trying to match two images using sift in OpenCv, but too many matches
To test your SIFT code out, you could create a black 512x512 image in Opencv with three equally spaced white colored points along its width. Then, rotate the image by small rotations angles, measure the angle, and check the feature matches. As you are doing this, you will realize that for large rotations, the features matches are thrown off.

How to compare two contours of a binary pattern image?

I'm creating a part scanner in C that pulls all possibilities for scanned parts as images in a directory. My code currently fetches all images from that directory and dumps them into a vector. I then produce groups of contours for all the images. The program then falls into a while loop where it constantly grabs images from a webcam, and generates contours for those as well. I have set up a jig for the part to rest on, so orientation and size are not a concern, however I don't want to have to calibrate the machine, so there may be movement between the template images and the part images taken.
What is the best way to compare the contours? I have tried several methods including matchTemplate without contours, but if you take a look at the two parts below, you can see that these two are very close to each other, so matchShapes and matchTemplate can't distinguish between them the way I was using them. I'm also not sure how to use cvMatchShapes. It works with just loading the images directly into match shapes, but the results are inconclusive. I think that contours is the way to go, I'm just not sure of how to go about implementing the comparison phase. Any help would be great.
You can view the templates here: http://www.cryogendesign.com/partDetection.html"
If you are ready for do-it-yourself, one approach could be to compute a "distance image" (assign every pixel the smallest Euclidean distance to the contour taken as the reference). See http://en.wikipedia.org/wiki/Distance_transform.
Using this distance image, you can quickly compute the average distance of a new contour to the reference one (for every contour pixel, get the distance from the distance image). The average distance gives you an indication of the goodness-of-fit and will let you find the best match to a set of reference templates.
If the parts have some moving freedom, the situation is a bit harder: before computing the average distance, you must fit the new contour to the reference one. You will need to apply a suitable transform (translation, rotation, possibly scaling), and find the parameters that will minimize... the average distance.
You can calculate the chamfer distance between the two contours:
T and E are the set of edges of the template and the image and x is the point of reference where you start to compare the two set of edges. So for each x you get a different value.
DT is the distance transform of an image. Matlab provides the algorithm here.
If you want a more detailed version of how to calculate the chamfer distance, take a look here.

Resources