Backstory , in my country there is a picture of its founding father in every bank denomination :
.
I want to find the similarity between these two images via surf detectors .The system will be trained by both images. The user will present the bottom picture or the top picture via a webcam and will use the similarity score between them to find its denomination value .
My pseudocode:
1.Detect keypoints and the corresponding descriptors of both the images via surf detector and descriptor .
2.a.Calculate the matching vector between the query and each of the trained example .Find number of good matches / total number of matches for each image .
2.b.OR Apply RANSAC algorithm and find the highest number of closest pair between query and training algorithm
3.The one having the higher value will have higher score and better similarity.
Is my method sound enough , or is there any other method to find similarity between two images in which the query image will undergo various transformations . I have looked for solutions for this such as finding Manhattan distance , or finding correlation , however none of them are adequate for this problem.
Yes, you are doing it the right way
1) You create a training set and store all its feature-points .
2) Perform ratio test for matches with the query and train feature-points.
3) Apply ransac test and draw matches (apply homograpghy if you want highlight the detected note).
This paper might be helpful, they are doing similar thing using SIFT
Your algorithm looks fine, but you have much more information with you which you can make use of. I will give you a list of information which you can use to further improve your results:
1. Location of the part where denominations are written on the image.
2. Information about how denominations are written - Script knowledge.
3. Homo-graphic information as you know the original image and The observed image
Make use all the above information to improve the result.
Related
I am trying to write a function in OpenCv for comparing two images - imageA and imageB, to check to what extent they are similar.
I want to arrive at three comparison scores(0 to 100 value) as shown below.
1. Histograms - compareHist() : OpenCV method
2. Template Matching - matchTemplate() : OpenCV method
3. Feature Matching - BFMatcher() : OpenCV method
Above on the scores derived from the above calculations I want to arrive at a conclusion regarding the matching.
I was successful in getting this functions to work, but not at getting a comparison score for it. I would be great if someone could help me with that. Also, any other advice regarding this sort of image matching is also welcome.
I know there are different kind of algorithms that can be used for the above functions. So, just clarifying on the kind of images that I will be using.
1. As mentioned above it will be a one-to-one comparison.
2. Its all images taken by a human using a mobile camera.
3. The images that match will be taken of the same object/place from the same spot mostly. (Accoding to the time of the day, the lighting could differ)
4. If the images doesn't match the user will be asked to click another one, till it matches.
5. The kind of images compared could include - corridor, office table, computer screen(content on the screen to be compared), pepper document etc.
1- With histogram you can get a comparison score using histogram intersection. If you divide the intersection of two histograms to the union of the two histogram, will give you a score between 0 (no match at all) and 1 (complete match) like the example in the below graph:
You can compute the intersection for histogram with a simple For loop.
2- In template matching, the score you get is different for each method of comparing. In this link you can see the details of each method. In some methods highest score means the best match, but in some others, the lowest score means the most matched. For defining a score between 0 and 1, you should consider 2 scores: one for matching an image with itself (most match score) and two, matching two completely different images (lowest match) and then normalize the scores by the number of pixels in the image (height*width).
3- Feature matching is different than the last two methods. You may have two similar image with poor features (which fail in matching) or having two conceptually different images and have many matched features. Although if the images are feature-rich we can define something as a score. For this purpose, consider this example:
Img1 has 200 features
Img2 has 170 features
These two images have 100 matched features
Consider 0.5 (100/200) as the whole image matching score
You can also involve the distances between the matched pairs of features into the scoring but I think that's enough.
Regarding the comparison score. Have you tried implementing a weighted average to get a final comparison metric? Weight the 3 matching methods you are implementing according to their accuracy, the best method gets the “heaviest” weight.
Also, if you want to explore additional matching methods, give FFT-based matching a try: http://machineawakening.blogspot.com/2015/12/fft-based-cosine-similarity-for-fast.html
I'm implementing a cache for virtual reality applications: given an input image query, return the result associated to the most visually similar cached image (so a previously processed query) if the distance between the query representation and the cached image representation is lower than a certain threshold. Our cache is relatively small and contains 10k images representations.
We use VLAD codes [1] as image representation since they are very compact and incredibly fast to compute (around 1 ms).
However, it has been shown in [2] that the the distance between the query code and the images in the dataset (the cache in this case) is very different from query to query, so it's not trivial to find an absolute threshold. In the same work it's proposed a method for object detection applications, which is not relevant in this context (we return just the most similar image, not all and only the images containing the query subject).
[3] offers a very precise method, but at the same time it's very expensive and returns short lists. It's based on spatial feature matching re-ranking, and if you want to know more details the quoted section is at the end of this question. I'm not an expert in computer vision, but this step sounds to me a lot like using a Feature Matcher on the short-list of the top-k elements according to the image representation and re-rank them based on the number of features matched. My first question is: is that correct?
In our case this approach is not a problem, since most of the times the top-10 most similar VLAD codes contains the query subject, and so we should do this spatial matching step only on 10 images.
However, at this point I have a second question: if we had the problem of deciding an absolute threshold for image representations (as VLAD codes), does this problem still persists with this approach? In the first case, the threshold was "the L2 distance between the query VLAD code and the closest VLAD code", here instead the threshold value would represent "the number of features matched between the query image and the image closest image using VLAD codes".
Of course my second question makes sense if the first question is positive.
The approach of [3]:
Geometrical Re-ranking verifies the global geometrical consistency
between matches (Lowe 2004; Philbin et al. 2007) for a short-list of
database images returned by the image search system. Here we implement
the approach of Lowe (2004) and apply it to a short-list of 200
images. We first obtain a set of matches, i.e., each descriptor of the
query image is matched to the 10 closest ones in all the short-list
images. We then estimate an affine 2D transforma- tion in two steps.
First, a Hough scheme estimates a trans- formation with 4 degrees of
freedom. Each pair of matching regions generates a set of parameters
that “vote” in a 4D histogram. In a second step, the sets of matches
from the largest bins are used to estimate a finer 2D affine
transform. The images for which the geometrical estimation succeeds
are returned in first positions and ranked with a score based on the
number of inliers. The images for which the estima- tion failed are
appended to the geometrically matched ones, with their order
unchanged.
I have implemented the SIFT algorithm in OpenCV for feature detection and matching using the following steps:
Background Removal using Otsu's thresholding
Feature Detection using SIFT feature detector
Descriptor Extraction using SIFT feature extractor
Matching feature vectors using BFMatcher(L2 Norm) and using the ratio test to filter
good matches
My objective is to classify images into different categories such as shoes, shirts etc. based on their similarity. For example two different heels should be more similar to each other than a heel and a sports shoe or a heel and a t-shirt.
However this algorithm is working well only when my template image is present in the search image (in any scale and orientation). If I compare two different heels, they don't match well and the matches are also random(heel of one image matches to the flat surface of the other image). There are also many false positives when I compare a heel with a sports shoe or a heel with a t-shirt or a heel with the picture of a baby!
I would like to look at a heel and identify it as a heel and return how similar the heel is to different images in my database giving maximum similarity for other heels, then followed by other shoes. It should not produce any similarity with irrelevant objects such as shirts, phones, pens..
I understand that the SIFT algorithm produces a descriptor vector for each keypoint based on the gradient values of pixels around the keypoint and images are matched purely using this attribute. Hence it is highly possible that a keypoint located near the heel of one shoe is matched to a keypoint at the surface of the other shoe. Therefore, what I gather is that this algorithm can be used only to detect exact matches and not to detect similarity between images
Could you please tell me if this algorithm can be used for my objective and if I am doing something wrong or suggest any other approach that I should use.
For classification of similar objects, I certainly would go for cascade classifiers.
Basically, cascade classifiers is a machine learning method where you train your classifier to detect an object in different images. For it to work well, you need to train your classifier with a lot of positive (where your object is) and negative (where your object is not) images. The method was invented by Viola and Jones in 2001.
There is a ready-made implementation in OpenCV for face detection, you will have a bit more explanations on the openCV documentation (sorry, can't post the link, I'm limited to 1 link for the moment ..)
Now, for the caveats :
First, you need a lot of positive and negative images. The more images you have, the better the algorithm will perform. Beware of over-learning : if your training dataset for heels contains, for instance, too many images of a given model it is possible that others will not be detected properly
Training the cascade classifier can be long and difficult. The end-result will depend on how well you choose the parameters for training the classifier. Some info on this can be found on this webpage : http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
When I execute a SVM with a training set and a validation set i check results with a confusion matrix, and all is good.
After that, how can i implement a system "query by example": i give a picture and return most similar image in an image set ( based on a threshold)?
There are example in python (with scikit-learn module)?
Retrieving similar images does not need a classifier. It is usually a Nearest Neighbor Problem, most likely have features with high-dimensions. The keys are to find:
What character of image you care most? shape? colors? color distribution? object?
What are the best features to describe the character of image? If it's color, then you might want R/G/B values or histograms.
How do you want to measure the similarity? i.e., the distance function.
Which algorithm you want to use in such a NearestNeighbor problem setup? Options include and not limited to: kd-tree, locality-sensitive-hashing, etc. Here has some good discussions.
I am using openCV Surf tracker to find exact points in two images.
as you know, Surf returns many Feature points in both images. what i want to do is using these feature parameters to find out which matches are exactly correct (true positive matches). In my application i need only true positive matches.
These parameters existed : Hessian, Laplacian, Distance, Size, Dir.
I dont know how to use these parameters?
is exact matches have less distance or more hessian? laplacian can help ? size or dir can help ?
How can i find Exact matches(true positives)??
You can find very decent matches between descriptors in the query and image by adopting the following strategy -
Use a 2 NN search for query descriptors among the image descriptors, and the following condition-
if distance(1st match) < 0.6*distance(2nd match) the 1st match is a "good match".
to filter out false positives.
It obvious you can't be 100% sure which points truly match. You can increase (in the cost of performance) positives by tuning SURF parameters (see some links here). Depending on your real task you can use robust algorithms to eliminate outliers, i.e. RANSAC if you perform kind of model fitting. Also, as Erfan said, you can use spatial information (check out "Elastic Bunch Graph Matching" and Spatial BoW).
The answer which I'm about to post is just my guess because I have not tested it to see whether it exactly works as predicted or not.
By comparing the relative polar distance between 3 random candidate feature points returned by opencv and comparing it with the counterpart points in the template (with a certain error), you can not only compute the probability of true positiveness, but also the angle and the scale of your matched pattern.
Cheers!