I have a set of Images of more then 1000 pictures. For every Image I extract SURF descriptors. Now I'll add a query Image and want to try to find the most similar image in the image set. For perfomance and memory reasons I just extract for every image 200 keypoint with descriptors. And this is more or less my problem. At the moment I filter the matches by doing this:
Symmetrie Matching:
Simple BruteForce Matching in both directions. So from Image1 to Image2
and from Image2 to Image1. I just keep the matches which exist in both
directions.
List<Matches> match1 = BruteForceMatching.BFMatch(act.interestPoints, query.interestPoints);
List<Matches> match2 = BruteForceMatching.BFMatch(query.interestPoints, act.interestPoints);
List<Matches> finalMatch = FeatureMatchFilter.DoSymmetryTest(match1, match2);
float distance = 0;
for(int i = 0; i < finalMatch.size(); i++)
distance += finalMatch.get(i).distance;
act.pic.distance = distance * (float) query.interestPoints.size() / (float) finalMatch.size();
I know there are more filter methods. How you can see I try to weight the distances by the number of the final matches. But I don't have the feeling Iam doing this correct. When I look to other approaches it looks they all compute with all extractet interest points which exists in the image. Does anyone have a good approach for this? Or a good idea to weight the distances?
I know there is no golden solution, but some experiences, ideas and other approaches would be really helpfull.
So "match1" represents the directed matches of one of the database images and "match2" a query image, "finalMatch" are all the matches between those images and "finalMatch.get(i).distance" is some kind of mean value between the two directed distances.
So what you do is, you calculate the mean of the sum of the distances and scale them by the number of interest points you have. The goal I assume is to have a good meassure of how well the overall images match.
I am pretty sure the distance you calculate doesn't reflect that similarity very well. Dividing the sum of the distances by the number of matches makes some sense and this might give you an idea of similarity when compared to other query images, but scaling this value with the number of interest points just doesn't do anything meaningful.
First of all I would suggest that you get rid of this scaling. I'm not sure what your brute force matching does exactly, but additionally to your symmetry test, you should discard matches where the ratio of the first and the second candidate is to high (if I remember right, Lowe suggest a threshold of 0.8). Then, if it is a rigid scene, I would suggest that you apply some kind of fundamental matrix estimation (8 point algorithm + RANSAC) and filter the result using epipolar geometry. I'm pretty sure the mean discriptor distance of the "real" matches will give you a good idea about the "similarity" of the database image and the query.
Related
I'm trying to use opencv to find some template in images. While opencv has several template matching methods, I have big trouble to understand the difference and when to use which by looking at their mathematic equization:
CV_TM_SQDIFF
CV_TM_SQDIFF_NORMED
CV_TM_CCORR
CV_TM_CCORR_NORMED
CV_TM_CCOEFF
Can someone explain the major difference between all these method in a non-mathematical way?
The general idea of template matching is to give each location in the target image I, a similarity measure, or score, for the given template T. The output of this process is the image R.
Each element in R is computed from the template, which spans over the ranges of x' and y', and a window in I of the same size.
Now, you have two windows and you want to know how similar they are:
CV_TM_SQDIFF - Sum of Square Differences (or SSD):
Simple euclidian distance (squared):
Take every pair of pixels and subtract
Square the difference
Sum all the squares
CV_TM_SQDIFF_NORMED - SSD Normed
This is rarely used in practice, but the normalization part is similar in the next methods.
The nominator term is same as above, but divided by a factor, computed from the
- square root of the product of:
sum of the template, squared
sum of the image window, squared
CV_TM_CCORR - Cross Correlation
Basically, this is a dot product:
Take every pair of pixels and multiply
Sum all products
CV_TM_CCOEFF - Cross Coefficient
Similar to Cross Correlation, but normalized with their Covariances (which I find hard to explain without math. But I would refer to
mathworld
or mathworks
for some examples
I am trying to find the similarity between two images using Correlogram. I have already created Correlograms (2D) for both the images.
Task: Now i want to find out how similar these two Correlograms are.
Problem: I do not how can i match these two Correlograms. Can i match them in the similar way in which we match two histograms?
Formula comparision:
As per a research paper, the mathematical formulas of histogram matching and Correlogram matching are following. It can be seen clearly that in case of histogram, the summation is taken only for the difference between the corresponding values of color bins. Whereas, in case of Correlogram matching, the summation is taken over two dimensions i.e. distance, color bins.
My code: i have two images i.e. Mat correlogram1 and Mat correlogram2 in which i am storing the values of Correlogram for two images. Then, i am trying to match them using the following code which is based upon the formula mentioned above.
double correlogramMatching(Mat correlogram1, Mat correlogram2)
{
double confidenceValue = 0;
for(int i=0; i<ColorBins; i++)
{
for(int j=0; j<DistanceRange; j++)
{
double value = (std::abs) ( (correlogram1.at<double>(i,j) - correlogram2.at<double>(i,j)) / (1 + correlogram1.at<double>(i,j) + correlogram1.at<double>(i,j)) );
confidenceValue = confidenceValue + value;
}
}
return confidenceValue;
}
Confusion: For the two same images, the value of confidanceValue is Zero and for two not so common images the values are like 66, 88....so on. So, upto which values should i predict if the two images are similar or not?
PS: I am doing the programming in OpenCV (C++).
A correlogram is also a co-occurrence matrix / histogram. To answer your question, the simple answer is yes. Remember, when you're comparing histograms by themselves, you are comparing the grayscale / colour content between two images / patches. By extending this to correlograms / co-occurrence matrices, you are also comparing the spatial distributions of the colours as well, which are handled by the third dimension, the distance, of the histogram. If you had two images that had the same colour distribution, but the spatial distributions are different, the histogram measures will also take this into account and will report a high dissimilarity / low similarity between them.
As such, you are perfectly fine in using standard histogram comparison measures between two correlograms (and I'm also speaking from experience). As such, you can simply use any standard techniques that compare histograms together. Examples include histogram intersection, the L_p norm, the chi-squared distance, the Bhattacharyya distance and so on.
Take a look at the following link for more details. There are some great histogram similarity / dissimilarity measures you can use to compare between two histograms, each with their own advantages and disadvantages. Also, Ander Biguri raised a good point. Be sure to normalize the contrast between each view to make the content between the histograms somewhat contrast and illumination independent.
Link: http://pi-virtualworld.blogspot.ca/2013/09/histogram-similarity.html
I'm attempting to wrap my head around the basics of CV. The bit that initially got me interested was template matching (it was mentioned in a Pycon talk unrelated to CV), so I figured I'd start there.
I started with this image:
Out of which I want to detect Mario. So I cut him out:
I understand the concept of sliding the template around the image to see the best fit, and following a tutorial, I'm able to find mario with the following code:
def match_template(img, template):
s = time.time()
img_size = cv.GetSize(img)
template_size = cv.GetSize(template)
img_result = cv.CreateImage((img_size[0] - template_size[0] + 1,
img_size[1] - template_size[1] + 1), cv.IPL_DEPTH_32F, 1)
cv.Zero(img_result)
cv.MatchTemplate(img, template, img_result, cv.CV_TM_CCORR_NORMED)
min_val, max_val, min_loc, max_loc = cv.MinMaxLoc(img_result)
# inspect.getargspec(cv.MinMaxLoc)
print min_val
print max_val
print min_loc
print max_loc
cv.Rectangle(img, max_loc, (max_loc[0] + template.width, max_loc[1] + template.height), cv.Scalar(120.), 2)
print time.time() - s
cv.NamedWindow("Result")
cv.ShowImage("Result", img)
cv.WaitKey(0)
cv.DestroyAllWindows()
So far so good, but then I came to realize that this is incredibly fragile. It will only ever find Mario with that specific background, and with that specific animation frame being displayed.
So I'm curious, given that Mario will always have the same Mario-ish attributes, (size, colors) is there a technique with which I could find him regardless of whether his currect frame is standing still, or one of the various run cycle sprites? Kind of like fuzzy matching that you can do on strings, but for images.
Maybe since he's the only red thing, there is a way of simply tracking the red pixels?
The whole other issue is removing the background from the template. Maybe that would help the MatchTemplate function find Mario even though he doesn't exactly match the tempate? As of now, I'm not entirely sure how that would work ( I see that there is a mask param in MatchTemplate, but I'll have to investigate further)
My main question is whether or not template matching is the way to go about detecting an image that is mostly the same, but varies (like when he's walking), or is there another technique I should look into?
Update:
Attempts at matching other Marios
Going off of mmgp's suggestion that it should be workable for matching other things, I ran a couple of tests.
I used this as the template to match:
And then took a couple of screen shots to test the matching against.
For the first, I successfully find Mario, and get a max value of 1.
However, trying to find jumping Mario results in a complete misfire.
Now granted, the mario in the template, and the mario in the scene is facing opposite directions, as well as being different animation frames, but I would think they still match a lot more than anything else in the image -- if only for the colors alone. But it targets the platform as being the closest match to the template.
Note that the max value for this one was 0.728053808212.
Next I tried a scene without mario to see what would happen.
But oddly enough, I get the exact result as the image with jumping mario -- right down to the similarity value: 0.728053808212. Mario being in the picture is just as accurate as him not being in the picture.
Really strange! I don't know the actual details of the underlying algorithm, but I'd imagine, from a standard deviation perspective, the boxes in the scene that at least match the Red in template Mario's suit would be closer to the mean distance than a blue platform, right? So, it's extra confusing that it's not even in the general area of where I would expect it to be.
I'm guessing this is user error on my end, or maybe just a misunderstanding.
Why would a scene with a similar Mario have as much of a match as a scene with no Mario at all?
No method is infallible, but template matching do have a good chance to work there. It might require some pre-processing, and until there is a larger sample (a short video for example) to demonstrate the possible problems, there isn't much point in trying more advanced methods simply because some library implement them for you -- especially if you don't know under which conditions they are expected to work.
For instance, here are the results I get using template matching (red rectangles) -- all them are using the template http://i.stack.imgur.com/EYs9B.png, even the last one:
To achieve that I started by considering only the red channel of both the template and the input image. From that we easily calculate the internal morphological gradient and only then perform the matching. In order to not get a rectangle when Mario is not present, it is needed to set a minimum threshold for the matching. Here is the template and one of the images after these two transformations:
And here is some sample code to achieve that:
import sys
import cv2
import numpy
img = cv2.imread(sys.argv[1])
img2 = img[:,:,2]
img2 = img2 - cv2.erode(img2, None)
template = cv2.imread(sys.argv[2])[:,:,2]
template = template - cv2.erode(template, None)
ccnorm = cv2.matchTemplate(img2, template, cv2.TM_CCORR_NORMED)
print ccnorm.max()
loc = numpy.where(ccnorm == ccnorm.max())
threshold = 0.4
th, tw = template.shape[:2]
for pt in zip(*loc[::-1]):
if ccnorm[pt[::-1]] < threshold:
continue
cv2.rectangle(img, pt, (pt[0] + tw, pt[1] + th),
(0, 0, 255), 2)
cv2.imwrite(sys.argv[2], img)
I expect it to fail in more varied situations, but there are a couple of easy adjustments to be done.
Template matching doesn't always give good results. you should look into Keypoints matching.
Step1: Find Keypoints
Let's assume that you managed to cut out Mario or get ROI image of mario. Make this image your template image. Now, find keypoints in the main image and also in the template. So now you have two sets of keypoints. One for the image and other for Mario(template).
You can use SIFT, SURF, ORB depending on your preferences.
[EDIT]:
This is what I got using this method with SIFT and flann based knn matching. I haven't done the bounding box part.
Since your template is very small, SIFT and SURF would not give many keypoints. But to get good number of feature points, you could try Harris Corner detector. I applied Harris corner on the image and I got pretty good points on Mario.
Step2: Match Keypoints
If you have used SIFT or SURF, you'd have descriptors of both the image and the template. Match these keypoints using KNN or some other efficient matching algorithm. If you are using OpenCV, I'd suggest you to look into flannbased matcher. After matching the keypoints, you would want to filter out the incorrect matches. You can do this by K- nearest neighbors and depending upon the distance of the nearest match you can further filter out keypoints. You can further filter your matches using Forward-Backward Error.
Forward-Backward Error Estimation:
Match template keypoints to the image keypoints This will give you a set of matches.
Match the image keypoints to the template keypoints. This will give you another set of matches.
Common set of both these set will filter out incorrect matches.
[EDIT]:
If you are using Harris Corner detector, you'd get only points and not keypoints. You can either convert them into keypoints or write your own brute force mathcer. It's not that difficult.
Step3: Estimation
After filtering the keypoints, you'd have a cluster of keypoints near your object (in this case, Mario) and few scattered keypoints. To eliminate these scattered keypoints, you could use clustering. DBSCAN clustering will help you get a good cluster of points.
step4: Bounding Box Estimation
Now you have a cluster of keypoints. Using k-means, you should try to find the center of the cluster. Once you obtain the center of the cluster, you can estimate the bounding box.
I hope this helps.
[EDIT]
Trying to match points using Harris Corners. After filtering Harris corners, I'm using brute force method to match the points. some better algorithm might give you better results.
What is the best way to match the scan (taken photo) point sets to the template point set (blue,green,red,pink circles in the images)?
I am using opencv/c++. Maybe some kind of the ICP algorithm? I would like to wrap the scan image to the template image!
template point set:
scan point set:
If the object is reasonably rigid and aligned, simple auto-correlation would do the trick.
If not, I would use RANSAC to estimate the transformation between the subject and the template (it seems that you have the feature points). Please provide some details on the problem.
Edit:
RANSAC (Random Sample Consensus) could be used in your case. Think about unnecessary points in your template as noise (false features detected by a feature detector) - they are the outliners. RANSAC could handle outliners, because it choose a small subset of feature points (the minimal amount that could initiate your model) randomly, initiates the model and calculates how well your model match the given data (how many other points in the template correspond to your other points). If you choose wrong subset, this value will be low and you will drop the model. If you choose right subset it will be high and you could improve your match with an LMS algorithm.
Do you have to match the red rectangles? The original image contains four black rectangles in the corners that seem to be made for matching. I can reliably find them with 4 lines of Mathematica code:
lotto = [source image]
lottoBW = Image[Map[Max, ImageData[lotto], {2}]]
This takes max(R,G,B) for each pixel, i.e. it filters out the red and yellow print (more or less). The result looks like this:
Then I just use a LoG filter to find the dark spots and look for local maxima in the result image
lottoBWG = ImageAdjust[LaplacianGaussianFilter[lottoBW, 20]]
MaxDetect[lottoBWG, 0.5]
Result:
Have you looked at OpenCV's descriptor_extractor_matcher.cpp sample? This sample uses RANSAC to detect the homography between the two input images. I assume when you say wrap you actually mean warp? If you would like to warp the image with the homography matrix you detect, have a look at the warpPerspective function. Finally, here are some good tutorials using the different feature detectors in OpenCV.
EDIT :
You may not have SURF features, but you certainly have feature points with different classes. Feature based matching is generally split into two phases: feature detection (which you have already done), and extraction which you need for matching. So, you might try converting your features into a KeyPoint and then doing the feature extraction and matching. Here is a little code snippet of how you might go about this:
typedef int RED_TYPE = 1;
typedef int GREEN_TYPE = 2;
typedef int BLUE_TYPE = 3;
typedef int PURPLE_TYPE = 4;
struct BenFeature
{
Point2f pt;
int classId;
};
vector<BenFeature> benFeatures;
// Detect the features as you normally would in addition setting the class ID
vector<KeyPoint> keypoints;
for(int i = 0; i < benFeatures.size(); i++)
{
BenFeature bf = benFeatures[i];
KeyPoint kp(bf.pt,
10.0, // feature neighborhood diameter (you'll probaby need to tune it)
-1.0, // (angle) -1 == not applicable
500.0, // feature response strength (set to the same unless you have a metric describing strength)
1, // octave level, (ditto as above)
bf.classId // RED, GREEN, BLUE, or PURPLE.
);
keypoints.push_back(kp);
}
// now proceed with extraction and matching...
You may need to tune the response strength such that it doesn't get thresholded out by the extraction phase. But, hopefully that's illustrative of what you might try to do.
Follow these steps:
Match points or features in two images, this will determine your wrapping;
Determine what transformation you are looking for for your wrapping. The most general would be homography (see cv::findHomography()) and the less general would be a simple translation (use cv::matchTempalte()). The intermediate case would be translation along x, y and rotation. For this I wrote a fast function that is better than Homography since it uses less degrees of freedom while still optimizing the right metrics (squared differences in coordinates):
https://stackoverflow.com/a/18091472/457687
If you think your matches have a lot of outliers use RANSAC on top of your step 1. You basically need to randomly select a minimal set of points required for finding parameters, solve, determine inliers, solve again using all inliers, and then iterate trying to improve your current solution (increase the number of inliers, reduce error, or both). See Wikipedia for RANSAC algorithm: http://en.wikipedia.org/wiki/Ransac
I'd like to know what would be the best strategy to compare a group of contours, in fact are edges resulting of a canny edges detection, from two pictures, in order to know which pair is more alike.
I have this image:
http://i55.tinypic.com/10fe1y8.jpg
And I would like to know how can I calculate which one of these fits best to it:
http://i56.tinypic.com/zmxd13.jpg
(it should be the one on the right)
Is there anyway to compare the contours as a whole?
I can easily rotate the images but I don't know what functions to use in order to calculate that the reference image on the right is the best fit.
Here it is what I've already tried using opencv:
matchShapes function - I tried this function using 2 gray scales images and I always get the same result in every comparison image and the value seems wrong as it is 0,0002.
So what I realized about matchShapes, but I'm not sure it's the correct assumption, is that the function works with pairs of contours and not full images. Now this is a problem because although I have the contours of the images I want to compare, they are hundreds and I don't know which ones should be "paired up".
So I also tried to compare all the contours of the first image against the other two with a for iteration but I might be comparing,for example, the contour of the 5 against the circle contour of the two reference images and not the 2 contour.
Also tried simple cv::compare function and matchTemplate, none with success.
Well, for this you have a couple of options depending on how robust you need your approach to be.
Simple Solutions (with assumptions):
For these methods, I'm assuming your the images you supplied are what you are working with (i.e., the objects are already segmented and approximately the same scale. Also, you will need to correct the rotation (at least in a coarse manner). You might do something like iteratively rotate the comparison image every 10, 30, 60, or 90 degrees, or whatever coarseness you feel you can get away with.
For example,
for(degrees = 10; degrees < 360; degrees += 10)
coinRot = rotate(compareCoin, degrees)
// you could also try Cosine Similarity, or even matchedTemplate here.
metric = SAD(coinRot, targetCoin)
if(metric > bestMetric)
bestMetric = metric
coinRotation = degrees
Sum of Absolute Differences (SAD): This will allow you to quickly compare the images once you have determined an approximate rotation angle.
Cosine Similarity: This operates a bit differently by treating the image as a 1D vector, and then computes the the high-dimensional angle between the two vectors. The better the match the smaller the angle will be.
Complex Solutions (possibly more robust):
These solutions will be more complex to implement, but will probably yield more robust classifications.
Haussdorf Distance: This answer will give you an introduction on using this method. This solution will probably also need the rotation correction to work properly.
Fourier-Mellin Transform: This method is an extension of Phase Correlation, which can extract the rotation, scale, and translation (RST) transform between two images.
Feature Detection and Extraction: This method involves detecting "robust" (i.e., scale and/or rotation invariant) features in the image and comparing them against a set of target features with RANSAC, LMedS, or simple least squares. OpenCV has a couple of samples using this technique in matcher_simple.cpp and matching_to_many_images.cpp. NOTE: With this method you will probably not want to binarize the image, so there are more detectable features available.