OpenCV: How to get inlier points using findHomography()/findFundamental() and RANSAC - opencv

OpenCV does not provide a RANSAC-function per se or at least in such a form that you can just call it and be done with it (e.g. cv::ransac(...)). All functions/methods that are able to use RANSAC have a flag that enables it. However this is not always useful if you actually want to do something else with the inliers RANSAC computes after you have estimated a homography/fundamental matrix for example create a nice plot in Octave or similar software/library of the points, apply additional algorithms on the remaining set of filtered matches etc.
After matching two images one gets a vector of matches. Along with that we have of course 2 sets of keypoints (one for each image) that were used in the matching process. Using matches and keypoints we create two vectors of points (e.g. cv::Point2f points) and pass these to findHomography(). From this and this posts I discovered how exactly the inliers are marked using a mask, that we pass to that function. Each row inside the mask relates to an inlier/outlier. However I am unable to figure out how to use the row-index information from my two sets of points. Looking at OpenCV's source code didn't get me too far. In findFundamental() (similar to findHomography() when it comes to its signature and the mask-part) they use compressPoints(), which seems to somehow combine the two sets we have as input (source and destination points) into one. While testing in order to determine the nature of the mask I tried 2 sets of matched points (converted cv::Keypoints to cv::Point2f - a standard procedure). Each set contains 300 points so in total we have 600 points. The returned mask contains 300 rows (values are not important for this topic at hand).
EDIT: While writing this I discovered the answer (see below) but decided to post this question anyway in case someone needs this information as soon as possible and in compact form. Note that we still need one of OpenCV's function, which support RANSAC. So if you have a set of points but no intention of computing homography or fundamental matrix, this is obviously not the way and I dare say that I was unable to find anything useful in OpenCV's API that can help avoid this obstacle therefore you need to use an external library.

The solution is actually quite trivial. As we know each row in our mask gives information if we have an inlier or an outlier. However we have 2 sets of points as input so how exactly does a row containing a single value represent two points? The nature of this sort of indexing appeared in my mind while thinking how actually those two sets of points appear in findHomography() (in my case I was computing the homography between two images). Both sets have equal number of points in them because of the simple fact that they are extracted from the matches between our pair of images. This means that a row in our mask is the actual index of the points in the two sets and also the index in the vector of matches for the two images. I have successfully managed to manually refer to a small subset of matched points based on this and the results are as expected. It is important that you don't alter the order of your matches and the 2D points you have extracted from them using the keypoints referenced in each cv::DMatch. Below you can see a simple example for a single pair of inliers.
for(int i = 0; i < matchesObjectScene.size(); ++i)
{
// extract points from keypoints based on matches
pointsObject.push_back(keypointsObject.at(matchesObjectScene.at(i).queryIdx).pt);
pointsScene.push_back(keypointsScene.at(matchesObjectScene.at(i).trainIdx).pt);
}
// compute homography using RANSAC
cv::Mat mask;
cv::Mat H = cv::findHomography(pointsObject, pointsScene, CV_RANSAC, ransacThreshold, mask);
In the example above if we print some inlier
int maskRow = 10;
std::cout << "POINTS: object(" << pointsObject.at(maskRow).x << "," << pointsObject.at(maskRow).y << ") - scene(" << pointsScene.at(maskRow).x << "," << pointsScene.at(maskRow).y << ")" << std::endl;
and then again but this time using our keypoints (can also be done with the extracted 2D points)
std::cout << "POINTS (via match-set): object(" << keypointsObject.at(matchesCurrentObject.at(maskRow).queryIdx).pt.x << "," << keypointsObject.at(matchesCurrentObject.at(maskRow).queryIdx).pt.y << ") - scene(" << keypointsScene.at(matchesCurrentObject.at(maskRow).trainIdx).pt.x << "," << keypointsScene.at(matchesCurrentObject.at(maskRow).trainIdx).pt.y << ")" << std::endl;
we actually get the same output:
POINTS: object(462,199) - sscene(485,49)
POINTS (via match-set): object(462,199) - scene(485,49)
To get the actual inlier we simply have to check if the current row in the mask actually contains a 0 or non-zero value:
if((unsigned int)mask.at<uchar>(maskRow))
// store match or keypoints or points somewhere where you can access them later

On a different note. It may not be possible for RANSAC to exist as a function by itself in OpenCV because RANSAC is an abstract technique of rejecting outliers. RANSAC relies on a base model for performing the outlier rejection. Now the base model is very generic. It could be anything (not necessarily points which have some relationship among themselves). This could be the reason why RANSAC only exists as a feature in other functions that perform some defined tasks which have some defined scope like findHomography, findFundamentalMat, etc.

Related

Image Analysis: sift / harris / affine / RANSAC

I am not sure if this falls under the criteria of a proper question, but still, I would like to give it a shot.
I am looking for a library or function that takes two SIFT descriptors in a form of a file (or a matrix) of [number_of_keypoints][feature_0...feature_127] - meaning 128 features per file and allows comparison of images (I am using harris-affine alg. to extract them: http://www.robots.ox.ac.uk/~vgg/research/affine/det_eval_files/extract_features2.tar.gz ).
I am interested in a method that would allow me to find mutual nearest neighbours, that would accept number of keypoints in the neighbourhood and success ratio.
E.g.
Lets say I have two files with keypoints (described by SIFT descriptor) (image_1.sift, image_2.sift). I would like the method to accept: number of keypoints in the neighbourhood, match ratio, where match ratio means in pseudo code:
For each keypoint in image_1
Pick 50 nearest neighbours from image_1 -> List<KeyPoints> neighbours_1
For each keypoint in image_2
Pick 50 nearest neighbours from image_2 -> List<KeyPoints> neighbours_2
int numberOfMatches = 0;
foreach(neighbour in neighbours_1)
{
if(neighbour == neighbours_2.Find(neighbour))
numberOfMatches++;
}
The ratio is number of matches to number keypoints taken into consideration.
For example FindMutualKeypoints(image_1, image_2, 50, 0.7)
It can be c#, java, python or matlab implementation. I don't have much to do with image analysis on regular basis and before I start to write my own implementation, I assumed there probably is one out there already. I am having problem finding the correct terms in English from translation from my mother tongue (looks like the terms are quite different), which is probably the reason, why I could not find it yet.
I think openCV is the way to go.
Here is an example for it: link
It uses SURF descriptors, but you can also use SIFT.
You then call the FLANN matcher which also give you information about the quality of the matches.

OpenCV Feature matching to check similarity between two scenes

I am working on a project and I have to make an AR drone follow a path based on a list of checkpoints which are saved in a directory. Each checkpoint is a scene that the drone should detect along its path. There could be differences between the checkpoints and the actual scenes in terms of brightness, small obstacles present in the actual scenes or small variation of the point of view. To detect the checkpoints while the drone was moving, I have decided to use feature matching to get the number of good matches and the ratio between the inliers and the number of good matches and use these parameters to check if the checkpoint has been reached or not.
Algorithm:
convert the image to grayscale
Use a detector to detect the keypoints (I have tried SIFT,SURF, ORB
and AKAZE)
Use an extractor to calculate the feature vectors
Use a matching algorithm to perform the matching (I have tried
Bruteforce and Bruteforce-Hamming)
Keep only the good matches and compute the number of inliers.
Check if the ratio between inliers and good matches is above a
threshold and the number of good_matches is above another threshold.
If this condition holds, then, the checkpoint has been matched.
Results: The checking algorithm is roughly good but sometimes it detects a checkpoint, that was taken from a landed drone, only after crossing it. While, with the same checkpoint, the checking algorithm does not detect it if the drone is slightly shifted to the left compared to the position in which the checkpoint has been taken.
Is it a good approach for this problem or is there a better way to reach my goal? If it is a good way, how can I improve the checking when the drone is close to the checkpoint?
The code that implements the feature matching is shown below:
matcher->knnMatch(desc1, desc2, dmatches, KNN_best_matches);
vector<Point2f> matches, inliers;
if(matches2points_nndr(kp1,kp2,dmatches,matches,DRATIO,MIN_MATCH_COUNT)){
*match = true;
//compute inliers
compute_inliers_ransac(matches, inliers, MAX_H_ERROR, false);
//update stats
stats.matches = (int)matches.size()/2;
stats.inliers = (int)inliers.size()/2;
stats.outliers = stats.matches - stats.inliers;
stats.ratio = (float) stats.inliers * 100.0 / (float) stats.matches;
}
In another class, stats.ratio is compared with a threshold.
if(draw_stats.ratio > threshold_matching){
//move to the next checkpoint
match = true;
}else{
std::cout << "ratio is under the threshold: " << draw_stats.ratio << std::endl;
match = false;
}

Convolution Vs Correlation

Can anyone explain me the similarities and differences, of the Correlation and Convolution ? Please explain the intuition behind that, not the mathematical equation(i.e, flipping the kernel/impulse).. Application examples in the image processing domain for each category would be appreciated too
You will likely get a much better answer on dsp stack exchange but... for starters I have found a number of similar terms and they can be tricky to pin down definitions.
Correlation
Cross correlation
Convolution
Correlation coefficient
Sliding dot product
Pearson correlation
1, 2, 3, and 5 are very similar
4,6 are similar
Note that all of these terms have dot products rearing their heads
You asked about Correlation and Convolution - these are conceptually the same except that the output is flipped in convolution. I suspect that you may have been asking about the difference between correlation coefficient (such as Pearson) and convolution/correlation.
Prerequisites
I am assuming that you know how to compute the dot-product. Given two equal sized vectors v and w each with three elements, the algebraic dot product is v[0]*w[0]+v[1]*w[1]+v[2]*w[2]
There is a lot of theory behind the dot product in terms of what it represents etc....
Notice the dot product is a single number (scalar) representing the mapping between these two vectors/points v,w In geometry frequently one computes the cosine of the angle between two vectors which uses the dot product. The cosine of the angle between two vectors is between -1 and 1 and can be thought of as a measure of similarity.
Correlation coefficient (Pearson)
Correlation coefficient between equal length v,w is simply the dot product of two zero mean signals (subtract mean v from v to get zmv and mean w from w to get zmw - here zm is shorthand for zero mean) divided by the magnitudes of zmv and zmw.
to produce a number between -1 and 1. Close to zero means little correlation, close to +/- 1 is high correlation. it measures the similarity between these two vectors.
See http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient for a better definition.
Convolution and Correlation
When we want to correlate/convolve v1 and v2 we basically are computing a series of dot-products and putting them into an output vector. Let's say that v1 is three elements and v2 is 10 elements. The dot products we compute are as follows:
output[0] = v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2]
output[1] = v1[0]*v2[1]+v1[1]*v2[2]+v1[2]*v2[3]
output[2] = v1[0]*v2[2]+v1[1]*v2[3]+v1[2]*v2[4]
output[3] = v1[0]*v2[3]+v1[1]*v2[4]+v1[2]*v2[5]
output[4] = v1[0]*v2[4]+v1[1]*v2[5]+v1[2]*v2[6]
output[5] = v1[0]*v2[7]+v1[1]*v2[8]+v1[2]*v2[9]
output[6] = v1[0]*v2[8]+v1[1]*v2[9]+v1[2]*v2[10] #note this is
#mathematically valid but might give you a run time error in a computer implementation
The output can be flipped if a true convolution is needed.
output[5] = v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2]
output[4] = v1[0]*v2[1]+v1[1]*v2[2]+v1[2]*v2[3]
output[3] = v1[0]*v2[2]+v1[1]*v2[3]+v1[2]*v2[4]
output[2] = v1[0]*v2[3]+v1[1]*v2[4]+v1[2]*v2[5]
output[1] = v1[0]*v2[4]+v1[1]*v2[5]+v1[2]*v2[6]
output[0] = v1[0]*v2[7]+v1[1]*v2[8]+v1[2]*v2[9]
Notice that we have less than 10 elements in the output as for simplicity I am computing the convolution only where both v1 and v2 are defined
Notice also that the convolution is simply a number of dot products. There has been considerable work over the years to be able to speed up convolutions. The sweeping dot products are slow and can be sped up by first transforming the vectors into the fourier basis space and then computing a single vector multiplication then inverting the result, though I won't go into that here...
You might want to look at these resources as well as googling: Calculating Pearson correlation and significance in Python
The best answer I got were from this document:http://www.cs.umd.edu/~djacobs/CMSC426/Convolution.pdf
I'm just going to copy the excerpt from the doc:
"The key difference between the two is that convolution is associative. That is, if F and G are filters, then F*(GI) = (FG)*I. If you don’t believe this, try a simple example, using F=G=(-1 0 1), for example. It is very convenient to have convolution be associative. Suppose, for example, we want to smooth an image and then take its derivative. We could do this by convolving the image with a Gaussian filter, and then convolving it with a derivative filter. But we could alternatively convolve the derivative filter with the Gaussian to produce a filter called a Difference of Gaussian (DOG), and then convolve this with our image. The nice thing about this is that the DOG filter can be precomputed, and we only have to convolve one filter with our image.
In general, people use convolution for image processing operations such as smoothing, and they use correlation to match a template to an image. Then, we don’t mind that correlation isn’t associative, because it doesn’t really make sense to combine two templates into one with correlation, whereas we might often want to combine two filter together for convolution."
Convolution is just like correlation, except that we flip over the filter before correlating

How to match features only in a part two given images?

I have two images.
And after finding the keypoints and descriptors, I want to search for matching features for the features in image1 in only a particular part of image 2.
Can I achieve it through matchesMask parameter of matches?
Or, is there any other method?
Please let me know.
P.s.- I am using FAST detector, ORB extractor and BFMatcher as of now.
I would copy the "particular part of image 2" in another matrix, and use it for the detection / matching.
For instance, if you wanted to create a matrix pointing to the region of "image2" defined by the first 5 columns and 10 rows, you could do:
cv::Mat subMatrix = image2.colRange(0, 5).rowRange(0, 10);
And then you would use subMatrix for the matching.

RANSAC Algorithm

Can anybody please show me how to use RANSAC algorithm to select common feature points in two images which have a certain portion of overlap? The problem came out from feature based image stitching.
I implemented a image stitcher a couple of years back. The article on RANSAC on Wikipedia describes the general algortihm well.
When using RANSAC for feature based image matching, what you want is to find the transform that best transforms the first image to the second image. This would be the model described in the wikipedia article.
If you have already got your features for both images and have found which features in the first image best matches which features in the second image, RANSAC would be used something like this.
The input to the algorithm is:
n - the number of random points to pick every iteration in order to create the transform. I chose n = 3 in my implementation.
k - the number of iterations to run
t - the threshold for the square distance for a point to be considered as a match
d - the number of points that need to be matched for the transform to be valid
image1_points and image2_points - two arrays of the same size with points. Assumes that image1_points[x] is best mapped to image2_points[x] accodring to the computed features.
best_model = null
best_error = Inf
for i = 0:k
rand_indices = n random integers from 0:num_points
base_points = image1_points[rand_indices]
input_points = image2_points[rand_indices]
maybe_model = find best transform from input_points -> base_points
consensus_set = 0
total_error = 0
for i = 0:num_points
error = square distance of the difference between image2_points[i] transformed by maybe_model and image1_points[i]
if error < t
consensus_set += 1
total_error += error
if consensus_set > d && total_error < best_error
best_model = maybe_model
best_error = total_error
The end result is the transform that best tranforms the points in image2 to image1, which is exacly what you want when stitching.

Resources