I am trying to write a function in OpenCv for comparing two images - imageA and imageB, to check to what extent they are similar.
I want to arrive at three comparison scores(0 to 100 value) as shown below.
1. Histograms - compareHist() : OpenCV method
2. Template Matching - matchTemplate() : OpenCV method
3. Feature Matching - BFMatcher() : OpenCV method
Above on the scores derived from the above calculations I want to arrive at a conclusion regarding the matching.
I was successful in getting this functions to work, but not at getting a comparison score for it. I would be great if someone could help me with that. Also, any other advice regarding this sort of image matching is also welcome.
I know there are different kind of algorithms that can be used for the above functions. So, just clarifying on the kind of images that I will be using.
1. As mentioned above it will be a one-to-one comparison.
2. Its all images taken by a human using a mobile camera.
3. The images that match will be taken of the same object/place from the same spot mostly. (Accoding to the time of the day, the lighting could differ)
4. If the images doesn't match the user will be asked to click another one, till it matches.
5. The kind of images compared could include - corridor, office table, computer screen(content on the screen to be compared), pepper document etc.
1- With histogram you can get a comparison score using histogram intersection. If you divide the intersection of two histograms to the union of the two histogram, will give you a score between 0 (no match at all) and 1 (complete match) like the example in the below graph:
You can compute the intersection for histogram with a simple For loop.
2- In template matching, the score you get is different for each method of comparing. In this link you can see the details of each method. In some methods highest score means the best match, but in some others, the lowest score means the most matched. For defining a score between 0 and 1, you should consider 2 scores: one for matching an image with itself (most match score) and two, matching two completely different images (lowest match) and then normalize the scores by the number of pixels in the image (height*width).
3- Feature matching is different than the last two methods. You may have two similar image with poor features (which fail in matching) or having two conceptually different images and have many matched features. Although if the images are feature-rich we can define something as a score. For this purpose, consider this example:
Img1 has 200 features
Img2 has 170 features
These two images have 100 matched features
Consider 0.5 (100/200) as the whole image matching score
You can also involve the distances between the matched pairs of features into the scoring but I think that's enough.
Regarding the comparison score. Have you tried implementing a weighted average to get a final comparison metric? Weight the 3 matching methods you are implementing according to their accuracy, the best method gets the “heaviest” weight.
Also, if you want to explore additional matching methods, give FFT-based matching a try: http://machineawakening.blogspot.com/2015/12/fft-based-cosine-similarity-for-fast.html
Related
I'm implementing a cache for virtual reality applications: given an input image query, return the result associated to the most visually similar cached image (so a previously processed query) if the distance between the query representation and the cached image representation is lower than a certain threshold. Our cache is relatively small and contains 10k images representations.
We use VLAD codes [1] as image representation since they are very compact and incredibly fast to compute (around 1 ms).
However, it has been shown in [2] that the the distance between the query code and the images in the dataset (the cache in this case) is very different from query to query, so it's not trivial to find an absolute threshold. In the same work it's proposed a method for object detection applications, which is not relevant in this context (we return just the most similar image, not all and only the images containing the query subject).
[3] offers a very precise method, but at the same time it's very expensive and returns short lists. It's based on spatial feature matching re-ranking, and if you want to know more details the quoted section is at the end of this question. I'm not an expert in computer vision, but this step sounds to me a lot like using a Feature Matcher on the short-list of the top-k elements according to the image representation and re-rank them based on the number of features matched. My first question is: is that correct?
In our case this approach is not a problem, since most of the times the top-10 most similar VLAD codes contains the query subject, and so we should do this spatial matching step only on 10 images.
However, at this point I have a second question: if we had the problem of deciding an absolute threshold for image representations (as VLAD codes), does this problem still persists with this approach? In the first case, the threshold was "the L2 distance between the query VLAD code and the closest VLAD code", here instead the threshold value would represent "the number of features matched between the query image and the image closest image using VLAD codes".
Of course my second question makes sense if the first question is positive.
The approach of [3]:
Geometrical Re-ranking verifies the global geometrical consistency
between matches (Lowe 2004; Philbin et al. 2007) for a short-list of
database images returned by the image search system. Here we implement
the approach of Lowe (2004) and apply it to a short-list of 200
images. We first obtain a set of matches, i.e., each descriptor of the
query image is matched to the 10 closest ones in all the short-list
images. We then estimate an affine 2D transforma- tion in two steps.
First, a Hough scheme estimates a trans- formation with 4 degrees of
freedom. Each pair of matching regions generates a set of parameters
that “vote” in a 4D histogram. In a second step, the sets of matches
from the largest bins are used to estimate a finer 2D affine
transform. The images for which the geometrical estimation succeeds
are returned in first positions and ranked with a score based on the
number of inliers. The images for which the estima- tion failed are
appended to the geometrically matched ones, with their order
unchanged.
Looking at the Histogram Documentation, there are 4(5) different comparison methods:
CV_COMP_CORREL Correlation
CV_COMP_CHISQR Chi-Square
CV_COMP_INTERSECT Intersection
CV_COMP_BHATTACHARYYA Bhattacharyya distance
CV_COMP_HELLINGER Synonym for CV_COMP_BHATTACHARYYA
They all give different outputs that are read differently as shown in the Compare Histogram Documentation. But I can't find anything that states how effectively each method performs compared against each other. Surely there are Pros and Cons for each method, otherwise why have multiple methods?
Even the OpenCV 2 Computer Vision Application Programming Cookbook has very little to say on the differnces:
The call to cv::compareHist is straightforward. You just input the two
histograms and the function returns the measured distance. The
specific measurement method you want to use is specified using a flag.
In the ImageComparator class, the intersection method is used (with
flag CV_COMP_INTERSECT). This method simply compares, for each bin,
the two values in each histogram, and keeps the minimum one. The
similarity measure is then simply the sum of these minimum values.
Consequently, two images having histograms with no colors in common
would get an intersection value of 0, while two identical histograms
would get a value equal to the total number of pixels.
The other methods available are the Chi-Square (flag CV_COMP_CHISQR)
which sums the normalized square difference between the bins, the
correlation method (flag CV_COMP_CORREL) which is based on the
normalized cross-correlation operator used in signal processing to
measure the similarity between two signals, and the Bhattacharyya
measure (flag CV_COMP_BHATTACHARYYA) used in statistics to estimate
the similarity between two probabilistic distributions.
There must be differences between the methods, so my question is what are they? and under what circumstances do they work best?
CV_COMP_INTERSECT is fast to compute since you just need the minimum value for each bin. But it will not tell you much about the distribution of the differences. Other methods try to achieve better and more continuous score as a match, under different assumptions about the pixel distribution.
You can find the formulae used in different methods, at
http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison.html
Some references to more details on the matching algorithms can be found at:
http://siri.lmao.sk/fiit/DSO/Prednasky/7%20a%20Histogram%20based%20methods/7%20a%20Histogram%20based%20methods.pdf
Backstory , in my country there is a picture of its founding father in every bank denomination :
.
I want to find the similarity between these two images via surf detectors .The system will be trained by both images. The user will present the bottom picture or the top picture via a webcam and will use the similarity score between them to find its denomination value .
My pseudocode:
1.Detect keypoints and the corresponding descriptors of both the images via surf detector and descriptor .
2.a.Calculate the matching vector between the query and each of the trained example .Find number of good matches / total number of matches for each image .
2.b.OR Apply RANSAC algorithm and find the highest number of closest pair between query and training algorithm
3.The one having the higher value will have higher score and better similarity.
Is my method sound enough , or is there any other method to find similarity between two images in which the query image will undergo various transformations . I have looked for solutions for this such as finding Manhattan distance , or finding correlation , however none of them are adequate for this problem.
Yes, you are doing it the right way
1) You create a training set and store all its feature-points .
2) Perform ratio test for matches with the query and train feature-points.
3) Apply ransac test and draw matches (apply homograpghy if you want highlight the detected note).
This paper might be helpful, they are doing similar thing using SIFT
Your algorithm looks fine, but you have much more information with you which you can make use of. I will give you a list of information which you can use to further improve your results:
1. Location of the part where denominations are written on the image.
2. Information about how denominations are written - Script knowledge.
3. Homo-graphic information as you know the original image and The observed image
Make use all the above information to improve the result.
I have 2 objects. I get n features from object 1 & m features from object 2.
n!=m
I have to measure the probability that object 1 is similar to object 2.
How can I do this?
There is a nice tutorial in the OpenCV website that does this. Check it out.
The idea is to get the distances between all those descriptors with a FlannBasedMatcher, get the closest ones, and run RANSAC to find some set of consistent features between the two objects. You don't get a probability, but the number of consistent features, from which you may score how good your detection is, but that is up to you.
You can group the features in the image where features are more.
Set a vector to use the same. There may be multiple matches from among-st that you can choose the highest one.
Are you talking about point feature descriptors, like SIFT, SURF, or FREAK?
In that case there are several strategies. In all cases you need a distance measure. For SIFT or SURF you can use the Euclidean distance between the descriptors, or the L1 norm, or the dot product (correlation). For binary features, like FREAK or BRISK, you typically use the Hamming distance.
Then, one approach, is to simply pick a threshold on the distance. This is likely to give you many-to-many matches. Another way is to use bipartite graph matching to find the minimum-cost or maximum-weight assignment between the two sets. A very practical approach is described by David Lowe, which uses a ratio test to discard ambiguous matches.
Many of these strategies are implemented in the matchFeatures function in the Computer Vision System Toolbox for MATLAB.
I'm trying to do some key feature matching in OpenCV, and for now I've been using cv::DescriptorMatcher::match and, as expected, I'm getting quite a few false matches.
Before I start to write my own filter and pruning procedures for the extracted matches, I wanted to try out the cv::DescriptorMatcher::radiusMatch function, which should only return the matches closer to each other than the given float maxDistance.
I would like to write a wrapper for the available OpenCV matching algorithms so that I could use them through an interface which allows for additional functionalities as well as additional extern (mine) matching implementations.
Since in my code, there is only one concrete class acting as a wrapper to OpenCV feature matching (similarly as cv::DescriptorMatcher, it takes the name of the specific matching algorithm and constructs it internally through a factory method), I would also like to write a universal method to implement matching utilizing cv::DescriptorMatcher::radiusMatch that would work for all the different matcher and feature choices (I have a similar wrapper that allows me to change between different OpenCV feature detectors and also implement some of my own).
Unfortunately, after looking through the OpenCV documentation and the cv::DescriptorMatcher interface, I just can't find any information about the distance measure used to calculate the actual distance between the matches. I found a pretty good matching example here using Surf features and descriptors, but I did not manage to understand the actual meaning of a specific value of the argument.
Since I would like to compare the results I'd get when using different feature/descriptor combinations, I would like to know what kind of distance measure is used (and if it can easily be changed), so that I can use something that makes sense with all the combinations I try out.
Any ideas/suggestions?
Update
I've just printed out the feature distances I get when using cv::DescriptorMatcher::match with various feature/descriptor combinations, and what I got was:
MSER/SIFT order of magnitude: 100
SURF/SURF order of magnitude: 0.1
SURF/SIFT order of magnitude: 50
MSER/SURF order of magnitude: 0.2
From this I can conclude that whichever distance measure is applied to the features, it is definitely not normalized. Since I am using OpenCV's and my own interfaces to work with different feature extraction, descriptor calculation and matching methods, I would like to have some argument for ::radiusMatch that I could use with all (most) of the different combinations. (I've tried matching using BruteForce and FlannBased matchers, and while the matches are slightly different, the discances between the matches are on the same order of magnitude for each of the combinations).
Some context:
I'm testing this on two pictures acquired from a camera mounted on top of a (slow) moving vehicle. The images should be around 5 frames (1 meter of vehicle motion) apart, so most of the features should be visible, and not much different (especially those that are far away from the camera in both images).
The magnitude of the distance is indeed dependent on the type of feature used. That is because some specialized feature descriptors also come with a specialized feature matcher that makes optimal use of the descriptor. If you want to obtain weights for the match distances of different feature types, your best bet is probably to make a training set of a dozen or more 1:1 matches, unleash each feature detector/matcher on it, and normalize the distances so that each detector has an average distance of 1 over all matches. You can then use the obtained weights on other datasets.
You should have a look at the following function in features2d.hpp in opencv library.
template<class Distance> void BruteForceMatcher<Distance>::commonRadiusMatchImpl()
Usually we use L2 distance to measure distance between matches. It depends on the descriptor you use. For example, Hamming distance is useful for the Brief descriptor since it counts the bit differences between two strings.