OpenCV Feature matching to check similarity between two scenes - opencv

I am working on a project and I have to make an AR drone follow a path based on a list of checkpoints which are saved in a directory. Each checkpoint is a scene that the drone should detect along its path. There could be differences between the checkpoints and the actual scenes in terms of brightness, small obstacles present in the actual scenes or small variation of the point of view. To detect the checkpoints while the drone was moving, I have decided to use feature matching to get the number of good matches and the ratio between the inliers and the number of good matches and use these parameters to check if the checkpoint has been reached or not.
Algorithm:
convert the image to grayscale
Use a detector to detect the keypoints (I have tried SIFT,SURF, ORB
and AKAZE)
Use an extractor to calculate the feature vectors
Use a matching algorithm to perform the matching (I have tried
Bruteforce and Bruteforce-Hamming)
Keep only the good matches and compute the number of inliers.
Check if the ratio between inliers and good matches is above a
threshold and the number of good_matches is above another threshold.
If this condition holds, then, the checkpoint has been matched.
Results: The checking algorithm is roughly good but sometimes it detects a checkpoint, that was taken from a landed drone, only after crossing it. While, with the same checkpoint, the checking algorithm does not detect it if the drone is slightly shifted to the left compared to the position in which the checkpoint has been taken.
Is it a good approach for this problem or is there a better way to reach my goal? If it is a good way, how can I improve the checking when the drone is close to the checkpoint?
The code that implements the feature matching is shown below:
matcher->knnMatch(desc1, desc2, dmatches, KNN_best_matches);
vector<Point2f> matches, inliers;
if(matches2points_nndr(kp1,kp2,dmatches,matches,DRATIO,MIN_MATCH_COUNT)){
*match = true;
//compute inliers
compute_inliers_ransac(matches, inliers, MAX_H_ERROR, false);
//update stats
stats.matches = (int)matches.size()/2;
stats.inliers = (int)inliers.size()/2;
stats.outliers = stats.matches - stats.inliers;
stats.ratio = (float) stats.inliers * 100.0 / (float) stats.matches;
}
In another class, stats.ratio is compared with a threshold.
if(draw_stats.ratio > threshold_matching){
//move to the next checkpoint
match = true;
}else{
std::cout << "ratio is under the threshold: " << draw_stats.ratio << std::endl;
match = false;
}

Related

Image Analysis: sift / harris / affine / RANSAC

I am not sure if this falls under the criteria of a proper question, but still, I would like to give it a shot.
I am looking for a library or function that takes two SIFT descriptors in a form of a file (or a matrix) of [number_of_keypoints][feature_0...feature_127] - meaning 128 features per file and allows comparison of images (I am using harris-affine alg. to extract them: http://www.robots.ox.ac.uk/~vgg/research/affine/det_eval_files/extract_features2.tar.gz ).
I am interested in a method that would allow me to find mutual nearest neighbours, that would accept number of keypoints in the neighbourhood and success ratio.
E.g.
Lets say I have two files with keypoints (described by SIFT descriptor) (image_1.sift, image_2.sift). I would like the method to accept: number of keypoints in the neighbourhood, match ratio, where match ratio means in pseudo code:
For each keypoint in image_1
Pick 50 nearest neighbours from image_1 -> List<KeyPoints> neighbours_1
For each keypoint in image_2
Pick 50 nearest neighbours from image_2 -> List<KeyPoints> neighbours_2
int numberOfMatches = 0;
foreach(neighbour in neighbours_1)
{
if(neighbour == neighbours_2.Find(neighbour))
numberOfMatches++;
}
The ratio is number of matches to number keypoints taken into consideration.
For example FindMutualKeypoints(image_1, image_2, 50, 0.7)
It can be c#, java, python or matlab implementation. I don't have much to do with image analysis on regular basis and before I start to write my own implementation, I assumed there probably is one out there already. I am having problem finding the correct terms in English from translation from my mother tongue (looks like the terms are quite different), which is probably the reason, why I could not find it yet.
I think openCV is the way to go.
Here is an example for it: link
It uses SURF descriptors, but you can also use SIFT.
You then call the FLANN matcher which also give you information about the quality of the matches.

Why doesn't RANSAC remove all the outliers in SIFT matches?

I use SIFT to detect, describe feature points in two images as follows.
void FeaturePointMatching::SIFTFeatureMatchers(cv::Mat imgs[2], std::vector<cv::Point2f> fp[2])
{
cv::SiftFeatureDetector dec;
std::vector<cv::KeyPoint>kp1, kp2;
dec.detect(imgs[0], kp1);
dec.detect(imgs[1], kp2);
cv::SiftDescriptorExtractor ext;
cv::Mat desp1, desp2;
ext.compute(imgs[0], kp1, desp1);
ext.compute(imgs[1], kp2, desp2);
cv::BruteForceMatcher<cv::L2<float> > matcher;
std::vector<cv::DMatch> matches;
matcher.match(desp1, desp2, matches);
std::vector<cv::DMatch>::iterator iter;
fp[0].clear();
fp[1].clear();
for (iter = matches.begin(); iter != matches.end(); ++iter)
{
//if (iter->distance > 1000)
// continue;
fp[0].push_back(kp1.at(iter->queryIdx).pt);
fp[1].push_back(kp2.at(iter->trainIdx).pt);
}
// remove outliers
std::vector<uchar> mask;
cv::findFundamentalMat(fp[0], fp[1], cv::FM_RANSAC, 3, 1, mask);
std::vector<cv::Point2f> fp_refined[2];
for (size_t i = 0; i < mask.size(); ++i)
{
if (mask[i] != 0)
{
fp_refined[0].push_back(fp[0][i]);
fp_refined[1].push_back(fp[1][i]);
}
}
std::swap(fp_refined[0], fp[0]);
std::swap(fp_refined[1], fp[1]);
}
In the above code, I use findFundamentalMat() to remove outliers, but in the result img1 and img2 there are still some bad matches. In the images, each green line connects the matched feature point pair. And please ignore red marks. I can not find anything wrong, could anyone give me some hints? Thanks in advance.
RANSAC is just one of the robust estimators. In principle, one can use a variety of them but RANSAC has been shown to work quite well as long as your input data is not dominated by outliers. You can check other variants on RANSAC like MSAC, MLESAC, MAPSAC etc. which have some other interesting properties as well. You may find this CVPR presentation interesting (http://www.imgfsr.com/CVPR2011/Tutorial6/RANSAC_CVPR2011.pdf)
Depending on the quality of the input data, you can estimate the optimal number of RANSAC iterations as described here (https://en.wikipedia.org/wiki/RANSAC#Parameters)
Again, it is one of the robust estimator methods. You may take other statistical approaches like modelling your data with heavy tail distributions, trimmed least squares etc.
In your code you are missing the RANSAC step. RANSAC has basically 2 steps:
generate hypothesis (do a random selection of data points necessary to fit your mode: training data).
model evaluation (evaluate your model on the rest of the points: testing data)
iterate and choose the model that gives the lowest testing error.
RANSAC stands for RANdom SAmple Consensus, it does not remove outliers, it selects a group of points to calculate the fundamental matrix for that group of points. Then you need to do a re projection using the fundamental matrix just calculated with RANSAC to remove the outliers.
Like any algorithm, ransac is not perfect. You can try to run other disponible (in the opencv implementation) robust algorithms, like LMEDS. And you can reiterate, using the last points marked as inliers as input to a new estimation. And you can vary the threshold and confidence level. I suggest run ransac 1 ~ 3 times and then run LMEDS, that does not need a threshold, but only work well with at least +50% of inliers.
And, you can have geometrical order problems:
*If the baseline between the two stereo is too small the fundamental matrix estimation can be unreliable, and may be better to use findHomography() instead, for your purpose.
*if your images have some barrel/pincushin distortion, they are not in conformity with epipolar geometry and the fundamental matrix are not the correct mathematical model to link matches. In this case, you may have to calibrate your camera and then run undistort() and then process the output images.

OpenCV: How to get inlier points using findHomography()/findFundamental() and RANSAC

OpenCV does not provide a RANSAC-function per se or at least in such a form that you can just call it and be done with it (e.g. cv::ransac(...)). All functions/methods that are able to use RANSAC have a flag that enables it. However this is not always useful if you actually want to do something else with the inliers RANSAC computes after you have estimated a homography/fundamental matrix for example create a nice plot in Octave or similar software/library of the points, apply additional algorithms on the remaining set of filtered matches etc.
After matching two images one gets a vector of matches. Along with that we have of course 2 sets of keypoints (one for each image) that were used in the matching process. Using matches and keypoints we create two vectors of points (e.g. cv::Point2f points) and pass these to findHomography(). From this and this posts I discovered how exactly the inliers are marked using a mask, that we pass to that function. Each row inside the mask relates to an inlier/outlier. However I am unable to figure out how to use the row-index information from my two sets of points. Looking at OpenCV's source code didn't get me too far. In findFundamental() (similar to findHomography() when it comes to its signature and the mask-part) they use compressPoints(), which seems to somehow combine the two sets we have as input (source and destination points) into one. While testing in order to determine the nature of the mask I tried 2 sets of matched points (converted cv::Keypoints to cv::Point2f - a standard procedure). Each set contains 300 points so in total we have 600 points. The returned mask contains 300 rows (values are not important for this topic at hand).
EDIT: While writing this I discovered the answer (see below) but decided to post this question anyway in case someone needs this information as soon as possible and in compact form. Note that we still need one of OpenCV's function, which support RANSAC. So if you have a set of points but no intention of computing homography or fundamental matrix, this is obviously not the way and I dare say that I was unable to find anything useful in OpenCV's API that can help avoid this obstacle therefore you need to use an external library.
The solution is actually quite trivial. As we know each row in our mask gives information if we have an inlier or an outlier. However we have 2 sets of points as input so how exactly does a row containing a single value represent two points? The nature of this sort of indexing appeared in my mind while thinking how actually those two sets of points appear in findHomography() (in my case I was computing the homography between two images). Both sets have equal number of points in them because of the simple fact that they are extracted from the matches between our pair of images. This means that a row in our mask is the actual index of the points in the two sets and also the index in the vector of matches for the two images. I have successfully managed to manually refer to a small subset of matched points based on this and the results are as expected. It is important that you don't alter the order of your matches and the 2D points you have extracted from them using the keypoints referenced in each cv::DMatch. Below you can see a simple example for a single pair of inliers.
for(int i = 0; i < matchesObjectScene.size(); ++i)
{
// extract points from keypoints based on matches
pointsObject.push_back(keypointsObject.at(matchesObjectScene.at(i).queryIdx).pt);
pointsScene.push_back(keypointsScene.at(matchesObjectScene.at(i).trainIdx).pt);
}
// compute homography using RANSAC
cv::Mat mask;
cv::Mat H = cv::findHomography(pointsObject, pointsScene, CV_RANSAC, ransacThreshold, mask);
In the example above if we print some inlier
int maskRow = 10;
std::cout << "POINTS: object(" << pointsObject.at(maskRow).x << "," << pointsObject.at(maskRow).y << ") - scene(" << pointsScene.at(maskRow).x << "," << pointsScene.at(maskRow).y << ")" << std::endl;
and then again but this time using our keypoints (can also be done with the extracted 2D points)
std::cout << "POINTS (via match-set): object(" << keypointsObject.at(matchesCurrentObject.at(maskRow).queryIdx).pt.x << "," << keypointsObject.at(matchesCurrentObject.at(maskRow).queryIdx).pt.y << ") - scene(" << keypointsScene.at(matchesCurrentObject.at(maskRow).trainIdx).pt.x << "," << keypointsScene.at(matchesCurrentObject.at(maskRow).trainIdx).pt.y << ")" << std::endl;
we actually get the same output:
POINTS: object(462,199) - sscene(485,49)
POINTS (via match-set): object(462,199) - scene(485,49)
To get the actual inlier we simply have to check if the current row in the mask actually contains a 0 or non-zero value:
if((unsigned int)mask.at<uchar>(maskRow))
// store match or keypoints or points somewhere where you can access them later
On a different note. It may not be possible for RANSAC to exist as a function by itself in OpenCV because RANSAC is an abstract technique of rejecting outliers. RANSAC relies on a base model for performing the outlier rejection. Now the base model is very generic. It could be anything (not necessarily points which have some relationship among themselves). This could be the reason why RANSAC only exists as a feature in other functions that perform some defined tasks which have some defined scope like findHomography, findFundamentalMat, etc.

Better estimation of Homography using Kalman filter?

I am creating an AR application that tracks feature , calculates homography and then obtains the object's pose from 3D-2D point correspondences and use that to render any 3D Object.
I am selecting a specific area for detecting features on my source image (by masking). and then matching it with features detected on subsequent frames.Then I filter those matches and estimate Homography of the unmasked region.
The problem is lies in Homography estimation. It differs every time(very slightly, but nonetheless, differs). The effect is : Even on keeping my camera still, I get a vibrating rectangle around my tracked region, which i draw using the estimated homography.
I have already posted a question titled Unstable homography estimation using ORB and got a reassurance of a fact i was considering (not recalculating my homography if the position of the region is similar to its last position).
However, I recently came to know of the Kalman filter, that it gives a better estimate of the position by combining our prior knowledge with our measurement observation.
So ,after looking at various examples (one in particular, http://www.youtube.com/watch?v=GBYW1j9lC1I), I modeled a Kalman filter(rather 4, one for every point of the rectanglular region) for my scenario:
m_KF1.init(4, 2, 1);
setIdentity(m_KF2.transitionMatrix);
m_measurement1 = Mat::zeros(2,1,cv::DataType<float>::type);
m_KF1.statePre.setTo(0);
m_KF1.controlMatrix.setTo(0);
//initialzing filter
m_KF1.statePre.at<float>(0) = m_scene_corners[1].x; //the first reading
m_KF1.statePre.at<float>(1) = m_scene_corners[1].y;
m_KF1.statePre.at<float>(2) = 0;
m_KF1.statePre.at<float>(3) = 0;
setIdentity(m_KF1.measurementMatrix);
setIdentity(m_KF1.processNoiseCov,Scalar::all(.1)) //updated at every step
setIdentity(m_KF1.measurementNoiseCov, Scalar::all(4)); //assuming measurement error of
//not more than 2 pixels
setIdentity(m_KF1.errorCovPost, Scalar::all(.1));
4 state variables (position in x, y and velocity in x,y).
2 measurement variables (position in x,y)
1 control variable (acceleration)
following steps taken at every iteration
//---First,the predicion phase , to update the internal variables-------//
// 'dt' is the time taken between the measurements
//Updating the transitionMatrix
m_KF1.transitionMatrix.at<float>(0,2) = dt;
m_KF1.transitionMatrix.at<float>(1,3) = dt;
//Updating the Control matrix
m_KF1.controlMatrix.at<float>(0,1) = (dt*dt)/2;
m_KF1.controlMatrix.at<float>(1,1) = (dt*dt)/2;
m_KF1.controlMatrix.at<float>(2,1) = dt;
m_KF1.controlMatrix.at<float>(3,1) = dt;
//Updating the processNoiseCovmatrix
m_KF1.processNoiseCov.at<float>(0,0) = (dt*dt*dt*dt)/4;
m_KF1.processNoiseCov.at<float>(0,2) = (dt*dt*dt)/2;
m_KF1.processNoiseCov.at<float>(1,1) = (dt*dt*dt*dt)/4;
m_KF1.processNoiseCov.at<float>(1,3) = (dt*dt*dt)/2;
m_KF1.processNoiseCov.at<float>(2,0) = (dt*dt*dt)/2;
m_KF1.processNoiseCov.at<float>(2,2) = dt*dt;
m_KF1.processNoiseCov.at<float>(3,1) = (dt*dt*dt)/2;
m_KF1.processNoiseCov.at<float>(3,3) = dt*dt;
Mat prediction1 = m_KF1.predict();
Point2f predictPt1(prediction1.at<float>(0),prediction1.at<float>(1));
// Get the measured corner
m_measurement1.at<float>(0,0) = scene_corners[0].x;
m_measurement1.at<float>(0,1) = scene_corners[0].y;
//----Then, the correction phase which uses the predicted value and our measured value
Mat estimated = m_KF1.correct(m_measurement1);
Point2f statePt1(estimated.at<float>(0),estimated.at<float>(1));
This model hardly corrects my measured value
Now My Questions are:
Is Kalman filter suited for my scenario? Will it give me any better results?
If it is, then what's missing? am I modelling it right? Instead creating 4 filters for four points of the rectangle, should I model it in some other manner (for instance, take the 10 strongest matches based on the distance and use those as input to the filter)
If Kalman filter isn't suited, what else can i do to provide more stability to the estimated homography?
Any help would be highly appreciated.
Thanks.
This question is badly titled and after reading your explanation what you really ask is: "Why my OpenCV Kalman filter will still leaves lots of noise?"
Anyway your answers are:
Yes Kalman works for your scenario
You are using it wrong
Modify: KF.processNoiseCov, You can get the code from here: Opencv kalman filter prediction without new observtion there is nice line explaining it.
See:
setIdentity(KF.processNoiseCov, Scalar::all(.005)); //adjust this for faster convergence - but higher noise
For what I see, you have very basic understanding of it, you can go to a naive approach and use four 2D Kalman filters, for this you can use the code here: . It will work, from there grow and adapt until you get a better understanding.
After that, you could model it more closely to your problem or you can keep using four filters, there is no "perfect" implementation so if that works for you, just go for it.

RANSAC Algorithm

Can anybody please show me how to use RANSAC algorithm to select common feature points in two images which have a certain portion of overlap? The problem came out from feature based image stitching.
I implemented a image stitcher a couple of years back. The article on RANSAC on Wikipedia describes the general algortihm well.
When using RANSAC for feature based image matching, what you want is to find the transform that best transforms the first image to the second image. This would be the model described in the wikipedia article.
If you have already got your features for both images and have found which features in the first image best matches which features in the second image, RANSAC would be used something like this.
The input to the algorithm is:
n - the number of random points to pick every iteration in order to create the transform. I chose n = 3 in my implementation.
k - the number of iterations to run
t - the threshold for the square distance for a point to be considered as a match
d - the number of points that need to be matched for the transform to be valid
image1_points and image2_points - two arrays of the same size with points. Assumes that image1_points[x] is best mapped to image2_points[x] accodring to the computed features.
best_model = null
best_error = Inf
for i = 0:k
rand_indices = n random integers from 0:num_points
base_points = image1_points[rand_indices]
input_points = image2_points[rand_indices]
maybe_model = find best transform from input_points -> base_points
consensus_set = 0
total_error = 0
for i = 0:num_points
error = square distance of the difference between image2_points[i] transformed by maybe_model and image1_points[i]
if error < t
consensus_set += 1
total_error += error
if consensus_set > d && total_error < best_error
best_model = maybe_model
best_error = total_error
The end result is the transform that best tranforms the points in image2 to image1, which is exacly what you want when stitching.

Resources