What is an efficient and correct metric I can use to compare two images in matrix form? I have built a machine learning model which predicts an image and want to see how far off it is from the target using a number for easy comparision.
There is a lot of different methods you can use. I guess the most popular ones are:
Euclidean Distance
Chord Distance
Pearson’s Correlation Coefficient
Spearman Rank Coefficient
You can also study about these and other metrics (their main advantages and drawbacks) from here: Image Registration - Principles, Tools and Methods / Authors: Goshtasby, A. Ardeshir
DOI: 10.1007/978-1-4471-2458-0
Hope it helps.
Adding to the excellent start from Victor Oliveira Antonino, I suggest starting with either Pearson's or Cosine. The rank coefficient isn't particularly applicable for this space; Euclidean and chord distance have properties that don't represent as well our human interpretations of image similarity.
Each metric has advantages and disadvantages. When you get into an application that doesn't map readily to physical distance, then Euclidean distance is unlikely to be the best choice.
Related
What are some analysis functions which can be used on the K-Medoids algorithms?
My main aim is to compare results of 2 different clustering results in order to see which is better.
Can SSE (sum of squared errors) be applied to K-Medoids algorithm?
The original k-medoid publication discusses the measures ESS, along with several other measures such as average dissimilarity, maximum dissimilarity, diameter that may be more appropriate to use.
SSE is closely related to Euclidean distance, so it usually is not appropriate (unless, of course, you use Euclidean; but why would you use k-medoids then instead of k-means?)
ARI, NMI, and Silhouette Coefficient can be used to compare the results
I've read something about Fisher Vector and I'm still in the learning process. It's a better representation than the classic BoF representation, exploiting GMM (or k-means, even if that's usually referred as VLAD).
However, I've seen that usually they are used for classification problem, for example with SVM.
But what about Image Retrieval? I've seen that they have been used for image retrieval too (here), but I don't understand one point: given two FV representing 2 images, how do we compute their distances and so "how similar the two images are?"
Is it reasonable to use them in such a context?
As seen in the two papers below, Euclidean distance seems to be the popular choice. There are also references to using dot-product as a similarity measure; cosine similarity (closely related) is a generally popular metric for ML similarity.
http://link.springer.com/article/10.1007/s11263-013-0636-x
http://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13/simonyan13.pdf
Is this enough to let you choose something that meets your needs?
I am using mahout recommenditembased algorithm. What are the differences between all the --similarity Classes available? How to know what is the best choice for my application? These are my choices:
SIMILARITY_COOCCURRENCE
SIMILARITY_LOGLIKELIHOOD
SIMILARITY_TANIMOTO_COEFFICIENT
SIMILARITY_CITY_BLOCK
SIMILARITY_COSINE
SIMILARITY_PEARSON_CORRELATION
SIMILARITY_EUCLIDEAN_DISTANCE
What does it mean each one?
I'm not familiar with all of them, but I can help with some.
Cooccurrence is how often two items occur with the same user. http://en.wikipedia.org/wiki/Co-occurrence
Log-Likelihood is the log of the probability that the item will be recommended given the characteristics you are recommending on. http://en.wikipedia.org/wiki/Log-likelihood
Not sure about tanimoto
City block is the distance between two instances if you assume you can only move around like you're in a checkboard style city. http://en.wikipedia.org/wiki/Taxicab_geometry
Cosine similarity is the cosine of the angle between the two feature vectors. http://en.wikipedia.org/wiki/Cosine_similarity
Pearson Correlation is covariance of the features normalized by their standard deviation. http://en.wikipedia.org/wiki/Pearson_correlation_coefficient
Euclidean distance is the standard straight line distance between two points. http://en.wikipedia.org/wiki/Euclidean_distance
To determine which is the best for you application you most likely need to have some intuition about your data and what it means. If your data is continuous value features than something like euclidean distance or pearson correlation makes sense. If you have more discrete values than something along the lines of city block or cosine similarity may make more sense.
Another option is to set up a cross-validation experiment where you see how well each similarity metric works to predict the desired output values and select the metric that works the best from the cross-validation results.
Tanimoto and Jaccard are similars, is a statistic used for comparing the similarity and diversity of sample sets.
https://en.wikipedia.org/wiki/Jaccard_index
I have 2 objects. I get n features from object 1 & m features from object 2.
n!=m
I have to measure the probability that object 1 is similar to object 2.
How can I do this?
There is a nice tutorial in the OpenCV website that does this. Check it out.
The idea is to get the distances between all those descriptors with a FlannBasedMatcher, get the closest ones, and run RANSAC to find some set of consistent features between the two objects. You don't get a probability, but the number of consistent features, from which you may score how good your detection is, but that is up to you.
You can group the features in the image where features are more.
Set a vector to use the same. There may be multiple matches from among-st that you can choose the highest one.
Are you talking about point feature descriptors, like SIFT, SURF, or FREAK?
In that case there are several strategies. In all cases you need a distance measure. For SIFT or SURF you can use the Euclidean distance between the descriptors, or the L1 norm, or the dot product (correlation). For binary features, like FREAK or BRISK, you typically use the Hamming distance.
Then, one approach, is to simply pick a threshold on the distance. This is likely to give you many-to-many matches. Another way is to use bipartite graph matching to find the minimum-cost or maximum-weight assignment between the two sets. A very practical approach is described by David Lowe, which uses a ratio test to discard ambiguous matches.
Many of these strategies are implemented in the matchFeatures function in the Computer Vision System Toolbox for MATLAB.
I was reading stuffs about pattern recognition. Recently I want to make a survey of methods to evaluate similarities of vectors. As far as I know, there are Euclidean distances, Mahalanobis distances and Cosine Distance. Can anyone present some more names or keywords to search?
Also mutual neighbor distance (MND), Minkowski metric, Hausdorff distance, conceptual similarity, normalized Google distance, KL divergence, Spearman’s rank correlation, and Lin similarity. (Not all of these are vector based.)
I highly recommend Pattern Classification by Duda, Hart, and Stork for further reading. It is extensively cited.
Pearson, Manhatten, Gower, Jaccard, Tanimoto, Russel Rao, Dice, Kulczynski, Simple Matching, Levenshtein
You can define your own distance metrics too, so I would say there can be A LOT of possible distance metrics. Now if those metrics are good or have any meaning is another story.
Hamming distance