How does ImageMagick's '-subimage-search' operation work? - image-processing

I have used ImageMagick in my application. I used ImageMagick for comparing images using the compare command with the -subimage-search option.
But there is very little documentation of about how -subimage-search works.
Can anyon provide me more information on how it works? For example:
Does it compare using colormodel or does it image segmentation to achieve its task?
What I know right now is it searches for the second image in the first.
But how this is done? Please explain.

Warning: Conducting a subimage-search is slow -- extremely slow even.
Theory
This slowness is due to how the subimage searching is designed to work: it is carries out a compare of the small image at every possible position within the larger image (with that area it currently covers at this location).
The basic command to use -subimage-search is this:
compare -subimage-search largeimage.ext subimage.ext resultimage.ext
As a result of this command you should get not one, but two images:
results-0.ext : this image should display the (best) matching location.
results-1.ext : this should be a "heatmap" of potential top-left corner locations.
The second image (map of locations) displays how well the sub-image matches at the respective position: the brighter the pixel, the better the match.
The "map" image has smaller dimensions, because it contains only locations or each potential top-left corner of the sub-image while fitting completely into the larger one. Its dimensions are:
width = width_of_largeimage - width_of_subimage + 1
height = height_of_largeimage - height_of_subimage + 1
The searching itself is conducted on the basis of differences of color vectors. Therefore it should result in fairly accurate color comparisons.
In order to improve efficiency and speed of searching, you could follow this strategical plan:
First, compare a very, very small sub-image of the sub-image with the larger image. This should find different possible locations faster.
Then use the results from step 1 to conduct a difference compare at each previously discovered potential location for more accurate matches.
Practical Example
Let's create two different images first:
convert rose: subimage.jpg
convert rose: -mattecolor blue -frame 20x5 largeimage.png
The first image, sub-image.jpg (on the left), being a JPEG, will have some lossiness in the color encodings, so sub-image can not possibly create an exact match.
The main difference of second image, largeimage.png (on the right), will be the blue frame around the main part:
Now time the compare-command:
time compare -subimage-search largeimage.png subimage.jpg resultimage.png
# 40,5
real 0m17.092s
user 0m17.015s
sys 0m0.027s
Here are the results:
resultimage-0.png (displaying best matching location) on the left;
resultimage-1.png (displaying the "heatmap" of potential matches) on the right.
Conclusion: Incorrect result? Bug?
Looking at the resulting images, and knowing how the two images were constructed, it seems to me that the result is not correct:
The command should have returned # 20,5 instead of # 40,5.
The resultimage-0.png should have the red area moved to the left by 20 pixels.
The heatmap, resultimage-1.png seems to indicate the best matching location as the darkest pixel; maybe I was wrong about my above "the brighter the pixel the better the match" statement, and it should be "the darker the pixel..."?.
I'll submit a bug report to the ImageMagick developers and see what they have to say about it....
Update
As suggested by #dlemstra, a ImageMagick developer, I tested with adding a -metric operation to the subimage-search. This operation returns a numerical value indicating the closeness of a match. There are various metrics available, which can be listed with
convert -list metric
This returns the following list on my notebook (running ImageMagick v6.9.0-0 Q16 x86_64):
AE Fuzz MAE MEPP MSE NCC PAE PHASH PSNR RMSE
The meanings of these abbreviations are:
AE : absolute error count, number of different pixels (-fuzz effected)
Fuzz : mean color distance
MAE : mean absolute error (normalized), average channel error distance
MEPP : mean error per pixel (normalized mean error, normalized peak error)
MSE : mean error squared, average of the channel error squared
NCC : normalized cross correlation
PAE : peak absolute (normalized peak absolute)
PHASH : perceptual hash
PSNR : peak signal to noise ratio
RMSE : root mean squared (normalized root mean squared)
An interesting (and relatively recent) metric is phash ('perceptual hash'). It is the only one that does not require identical dimensions for comparing images directly (without the -subimage-search option). It normally is the best 'metric' to narrow down similarly looking images (or at least to reliably exclude these image pairs which look very different) without really "looking at them", on the command line and programatically.
I did run the subimage-search with all these metrics, using a loop like this:
for m in $(convert -list metric); do
echo "METRIC $m";
compare -metric "$m" \
-subimage-search \
largeimage.png \
sub-image.jpg \
resultimage---metric-${m}.png;
echo;
done
This was the command output:
METRIC AE
compare: images too dissimilar `largeimage.png' # error/compare.c/CompareImageCommand/976.
METRIC Fuzz
1769.16 (0.0269957) # 20,5
METRIC MAE
1271.96 (0.0194089) # 20,5
METRIC MEPP
compare: images too dissimilar `largeimage.png' # error/compare.c/CompareImageCommand/976.
METRIC MSE
47.7599 (0.000728769) # 20,5
METRIC NCC
0.132653 # 40,5
METRIC PAE
12850 (0.196078) # 20,5
METRIC PHASH
compare: images too dissimilar `largeimage.png' # error/compare.c/CompareImageCommand/976.
METRIC PSNR
compare: images too dissimilar `largeimage.png' # error/compare.c/CompareImageCommand/976.
METRIC RMSE
1769.16 (0.0269957) # 20,5
So the following metric settings did not work at all with -subimage-search, as also indicated by the "images too dissimilar" message:
PSNR, PHASH, MEPP, AE
(I'm actually a bit surprised that the failed metrics include the PHASH one here. This may require further investigations...)
The following resultimages looked largely correct:
resultimage---metric-RMSE.png
resultimage---metric-FUZZ.png
resultimage---metric-MAE.png
resultimage---metric-MSE.png
resultimage---metric-PAE.png
The following resultimages look similarly incorrect as my first run above where no -metric result was asked for:
resultimage---metric-NCC.png (also returning the same incorrect coordinates as # 40,5)
Here are the two resulting images for -metric RMSE (what Dirk Lemstra had suggested to use):

Related

Imbalanced dataset, size limitation of 60mb, email categorization

I have a highly imbalanced dataset(approx - 1:100) of 1gb of raw emails, have to categorize these mails in 15 categories.
Problem that i have is that the size limit of file which will be used to train the model can not be more than 40mb.
So i want to filter out mails for each category which best represent the whole category.
For eg: for a category A, there are 100 emails in the dataset, due to size limitation i want to filter out only 10 emails which will represent the max features of all 100 emails.
I read that tfidf can be used to do this, for all the categories create a corpus of all the emails for that particular category and then try to find the emails that best represent but not sure how to do that. A code snippet will be of great help.
plus there are a lot of junk words and hash values in the dataset, should i clean all of those, even if i try its a lot to clean and manually its hard.
TF-IDF stands for Term Frequency, Inverse Term Frequency. The idea is to find out which words are more representative based on generality and specificity.
The idea that you were proposed is not that bad and could work for a shallow approach. Here's a snippet to help you understand how to do it:
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
## Suppose Docs1 and Docs2 are the groups of e-mails. Notice that docs1 has more lines than docs2
docs1 = ['In digital imaging, a pixel, pel,[1] or picture element[2] is a physical point in a raster image, or the smallest addressable element in an all points addressable display device; so it is the smallest controllable element of a picture represented on the screen',
'Each pixel is a sample of an original image; more samples typically provide more accurate representations of the original. The intensity of each pixel is variable. In color imaging systems, a color is typically represented by three or four component intensities such as red, green, and blue, or cyan, magenta, yellow, and black.',
'In some contexts (such as descriptions of camera sensors), pixel refers to a single scalar element of a multi-component representation (called a photosite in the camera sensor context, although sensel is sometimes used),[3] while in yet other contexts it may refer to the set of component intensities for a spatial position.',
'The word pixel is a portmanteau of pix (from "pictures", shortened to "pics") and el (for "element"); similar formations with \'el\' include the words voxel[4] and texel.[4]',
'The word "pixel" was first published in 1965 by Frederic C. Billingsley of JPL, to describe the picture elements of video images from space probes to the Moon and Mars.[5] Billingsley had learned the word from Keith E. McFarland, at the Link Division of General Precision in Palo Alto, who in turn said he did not know where it originated. McFarland said simply it was "in use at the time" (circa 1963).[6]'
]
docs2 = ['In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers. Dichotomization is the special case of discretization in which the number of discrete classes is 2, which can approximate a continuous variable as a binary variable (creating a dichotomy for modeling purposes, as in binary classification).',
'Discretization is also related to discrete mathematics, and is an important component of granular computing. In this context, discretization may also refer to modification of variable or category granularity, as when multiple discrete variables are aggregated or multiple discrete categories fused.',
'Whenever continuous data is discretized, there is always some amount of discretization error. The goal is to reduce the amount to a level considered negligible for the modeling purposes at hand.',
'The terms discretization and quantization often have the same denotation but not always identical connotations. (Specifically, the two terms share a semantic field.) The same is true of discretization error and quantization error.'
]
## We sum them up to have a universal TF-IDF dictionary, so that we can 'compare oranges to oranges'
docs3 = docs1+docs2
## Using Sklearn TfIdfVectorizer - it is easy and straight forward!
vectorizer = TfidfVectorizer()
## Now we make the universal TF-IDF dictionary, MAKE SURE TO USE THE MERGED LIST AND fit() [not fittransform]
X = vectorizer.fit(docs3)
## Checking the array shapes after using transform (fitting them to the tf-idf dictionary)
## Notice that they are the same size but with distinct number of lines
print(X.transform(docs1).toarray().shape, X.transform(docs2).toarray().shape)
(5, 221) (4, 221)
## Now, to "merge" them all, there are many ways to do it - here I used a simple "mean" method.
transformed_docs1 = np.mean(X.transform(docs1).toarray(), axis=0)
transformed_docs2 = np.mean(X.transform(docs1).toarray(), axis=0)
print(transformed_docs1)
print(transformed_docs2)
[0.02284796 0.02284796 0.02805426 0.06425141 0. 0.03212571
0. 0.03061173 0.02284796 0. 0. 0.04419432
0.08623564 0. 0. 0. 0.03806573 0.0385955
0.04569592 0. 0.02805426 0.02805426 0. 0.04299283
...
0. 0.02284796 0. 0.05610853 0.02284796 0.03061173
0. 0.02060219 0. 0.02284796 0.04345487 0.04569592
0. 0. 0.02284796 0. 0.03061173 0.02284796
0.04345487 0.07529817 0.04345487 0.02805426 0.03061173]
## These are the final Shapes.
print(transformed_docs1.shape, transformed_docs2.shape)
(221,) (221,)
About Removing junk words, TF-IDF averages rare words out (such as number, and etc) - if it is too rare, it wont matter much. But this could increase a lot the size of your input vectors, so I'd advise you to find a way to clean them. Also, consider some NLP preprocessing steps, such as lemmatization, to reduce dimensionality.

is ORB+BFMatcher a good match for recognizing repetitive images (with slight variations?)

I need to recognize images with hand-written numerals with known values. Physical objects with the number are always identical but come in slight variations of positions/scale/lighting. They are about 100 in number, having about 100x500 px in size.
In the first pass, the code should "learn" possible inputs, and then recognize them (classify them as being close to one of the "training" images) when they come again.
I was mostly following the Feature Matching Python-OpenCV tutorial
Input images are analyzed first, keypoints & descriptors are remembered in the orbTrained list:
import cv2
import collections
ORBTrained=collections.namedtuple('ORBTrained',['kp','des','img'])
orbTrained=[]
for img in trainingImgs:
z2=preprocessImg(img)
orb=cv2.ORB_create(nfeatures=400,patchSize=30,edgeThreshold=0)
kp,des=orb.detectAndCompute(z2,None)
orbTrained.append(ORBTrained(kp=kp,des=des,img=z2))
z3=cv2.drawKeypoints(z2,kp,None,color=(0,255,0),flags=0)
A typical result of this first stage looks like this:
Then in the next loop, for each real input image, cycle through all training images to see which is matching the best:
ORBMatch=collections.namedtuple('ORBMatch',['dist','match','train'])
for img in inputImgs:
z2=preprocessNum(img)
orb=cv2.ORB_create(nfeatures=400,patchSize=30,edgeThreshold=0)
kp,des=orb.detectAndCompute(z2,None)
bf=cv2.BFMatcher(cv2.NORM_HAMMING,crossCheck=True)
mm=[]
for train in orbTrained:
m=bf.match(des,train.des)
dist=sum([m_.distance for m_ in m])
mm.append(ORBMatch(dist=dist,match=m,train=train))
# sort matching images based on score
mm.sort(key=lambda m: m.dist)
print([m.dist for m in mm[:5]])
best=mm[0]
best.match.sort(key=lambda x:x.distance) # sort matches in the best match
z3=cv2.drawMatches(z2,kp,best.train.img,best.train.kp,best.match[:50],None,flags=2)
The result I get is nonsensical, and consistently so (only when I run with pixel-identical input, the result is correct):
What is the problem? Am I completely misunderstanding what to do, or do I just need to tune some parameters?
First, are you sure you're not reinventing the wheel by creating your own OCR library? There are many free frameworks, some of which support training with custom character sets.
Second, you should understand what feature matching is. It will find similar small areas, but isn't aware of other feature pairs. It will match similar corners of characters, not the characters itself. You might experiment with larger patchSize so that it covers at least half of the digit.
You can minimize false pairs by running feature detection only on a single digit at a time using thresholding and contours to find character bounds.
If the text isn't rotated, using roation-invariant feature descriptor such as ORB isn't the best option, try rotation-variant descriptor, such as FAST.
According to authors of papers (ORB and ORB-SLAM) ORB is invariant to rotation and scale "in a certain range". May be you should first match for small scale or rotation change.

The interpretation of scikit-image SSIM (structural similarity image metric) negative values

I'm using scikit-image SSIM to compare the similarity between two images. The thing is that I get negative values, which are not favorable for my purpose. I understand that the range of SSIM values are supposed to be between -1 to 1, but I need to get a positive value only, and I want this value to reduce as the similarity increases between the two images. I've been thinking of two ways to handle this issue. First, subtracting SSIM value from 1:
Similarity Measure=(1-SSIM)
Now, it gives zero in the case of perfect match (SSIM=1) and 1 when there is no similarity (SSIM=0). But, since SSIM also results into negative values between -1 and 0, I also get values larger than 1, which I don't know how to interpret. In particular, I don't know when SSIM returns negative values, what does it mean. Are the images with SSIM values between -1 and 0 less similar than images with SSIM of 0? Because, if this is not the case then my similarity measure will cause problem (it results into values more than 1 when SSIM is negative, which means less similarity compared to the case of SSIM=0).
Another measure that I was thinking to use is structural dissimilarity (DSSIM), which is defined as follows:
DSSIM=(1-SSIM)/2
This will return 0 when the two images are exactly the same, which is what I'm looking for, but DSSIM=1 when SSIM=-1 which corresponds with no similarity at all, and returns 1/2, when SSIM=0. Again, this can only be useful when SSIM of negative values shows less similarity than SSIM=0, which as I mentioned is something that I don't know about and couldn't find anything that explains about the corresponds of each value of SSIM in terms of the level of similarity between the two images. I hope someone could help me with such interpretation or some way to get only values of 0 and 1 for SSIM.
Edit: As I mentioned in the comments SSIM can be negative, and it is caused by the covarience of the two images that can be negative. In the Skimage SSIM source code, Covarience of the two images is represented by vxy, and it can be negative in some cases. As for the interpretation of negative values of SSIM in terms of similarity, I'm not still sure, but this paper states that this happens when local image structure is inverted. Still, I saw this for images that do not look like having inverted structure of each other. But, I guess local is important here, meaning that two images might not look like as inverted version of each other but their structure is inverted locally. Is this a right interpretation?
Yes, the similarity of two images with SSIM = 0 is better than SSIM = -1, so you can use:
1 - (1 + SSIM ) / 2

SURF interest point parameters

I want to give alternative interest points as input to SURF using the -p1 command (I'm using the authors implementation: http://www.vision.ee.ethz.ch/~surf/download.html). But I'm not sure what to make of the parameters.
I need to give x,y,a,b,c for each interest point, and according to the README, a=c and radius= 1/a^2 (with [a,b;b,c] being the entries of the second moment matrix). But when I look at an output file of surf's IP detection, the a,c parameter is always very small (e.g. 0.003). If radius=1/a^2, then that would give a region radius of 1/(0.003^2) > 100.000 pixels. Am I misinterpreting the README file, or are the a,c parameters that surf returns incorrect?
I think the README file is misleading. If you see the code. its actually a = 1/ radius^2. That puts the radius around 20 pixels in your example. go thru the main.cpp in the library to see how the a is calculated.
Krish is probably right about the radius. I don't remember unfortunately. About the other parameters you can use.
double image size: -d
This is good if you need high-precision interest points and descriptors for e.g. 3D reconstruction. If you use your own interest points, you can try -d to use smaller descriptor regions (only if you are sure that your interest points have a high precision).
custom lobe size: -ms 3
This defines the lobe size of the interest point detector. You don't need that if you have your own interest points.
number of octaves: -oc 4
This determines how many scales you want to analyze. If you use your own interest points, this is not needed.
initial sampling step: -ss 2
Sampling step for the Hessian detector. Not needed if you use your own interest points.
U-SURF (not rotation invariant): -u
This might be interesting for you. It does not use orientation invariance. This makes it faster for images sets that are taken with an upright camera as for robots for example.
extended descriptor (SURF-128): -e
Use the extended descriptor if you want to do 3D reconstruction and robust point matches. Somehow, it does not work so good for object recognition. Use smaller descriptor for OR.
descriptor size: -in 4
This defines square size/numbers of the descriptor window (default 4x4). If you reduce this number to e.g. 2, it will produce a 16-dimensional descriptor, which is not so bad for object recognition.
Hope that helps.

Simple and fast method to compare images for similarity

I need a simple and fast way to compare two images for similarity. I.e. I want to get a high value if they contain exactly the same thing but may have some slightly different background and may be moved / resized by a few pixel.
(More concrete, if that matters: The one picture is an icon and the other picture is a subarea of a screenshot and I want to know if that subarea is exactly the icon or not.)
I have OpenCV at hand but I am still not that used to it.
One possibility I thought about so far: Divide both pictures into 10x10 cells and for each of those 100 cells, compare the color histogram. Then I can set some made up threshold value and if the value I get is above that threshold, I assume that they are similar.
I haven't tried it yet how well that works but I guess it would be good enough. The images are already pretty much similar (in my use case), so I can use a pretty high threshold value.
I guess there are dozens of other possible solutions for this which would work more or less (as the task itself is quite simple as I only want to detect similarity if they are really very similar). What would you suggest?
There are a few very related / similar questions about obtaining a signature/fingerprint/hash from an image:
OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?
Image fingerprint to compare similarity of many images
Near-Duplicate Image Detection
OpenCV: Fingerprint Image and Compare Against Database.
more, more, more, more, more, more, more
Also, I stumbled upon these implementations which have such functions to obtain a fingerprint:
pHash
imgSeek (GitHub repo) (GPL) based on the paper Fast Multiresolution Image Querying
image-match. Very similar to what I was searching for. Similar to pHash, based on An image signature for any kind of image, Goldberg et al. Uses Python and Elasticsearch.
iqdb
ImageHash. supports pHash.
Image Deduplicator (imagededup). Supports CNN, PHash, DHash, WHash, AHash.
Some discussions about perceptual image hashes: here
A bit offtopic: There exists many methods to create audio fingerprints. MusicBrainz, a web-service which provides fingerprint-based lookup for songs, has a good overview in their wiki. They are using AcoustID now. This is for finding exact (or mostly exact) matches. For finding similar matches (or if you only have some snippets or high noise), take a look at Echoprint. A related SO question is here. So it seems like this is solved for audio. All these solutions work quite good.
A somewhat more generic question about fuzzy search in general is here. E.g. there is locality-sensitive hashing and nearest neighbor search.
Can the screenshot or icon be transformed (scaled, rotated, skewed ...)? There are quite a few methods on top of my head that could possibly help you:
Simple euclidean distance as mentioned by #carlosdc (doesn't work with transformed images and you need a threshold).
(Normalized) Cross Correlation - a simple metrics which you can use for comparison of image areas. It's more robust than the simple euclidean distance but doesn't work on transformed images and you will again need a threshold.
Histogram comparison - if you use normalized histograms, this method works well and is not affected by affine transforms. The problem is determining the correct threshold. It is also very sensitive to color changes (brightness, contrast etc.). You can combine it with the previous two.
Detectors of salient points/areas - such as MSER (Maximally Stable Extremal Regions), SURF or SIFT. These are very robust algorithms and they might be too complicated for your simple task. Good thing is that you do not have to have an exact area with only one icon, these detectors are powerful enough to find the right match. A nice evaluation of these methods is in this paper: Local invariant feature detectors: a survey.
Most of these are already implemented in OpenCV - see for example the cvMatchTemplate method (uses histogram matching): http://dasl.mem.drexel.edu/~noahKuntz/openCVTut6.html. The salient point/area detectors are also available - see OpenCV Feature Detection.
I face the same issues recently, to solve this problem(simple and fast algorithm to compare two images) once and for all, I contribute an img_hash module to opencv_contrib, you can find the details from this link.
img_hash module provide six image hash algorithms, quite easy to use.
Codes example
origin lena
blur lena
resize lena
shift lena
#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/img_hash.hpp>
#include <opencv2/imgproc.hpp>
#include <iostream>
void compute(cv::Ptr<cv::img_hash::ImgHashBase> algo)
{
auto input = cv::imread("lena.png");
cv::Mat similar_img;
//detect similiar image after blur attack
cv::GaussianBlur(input, similar_img, {7,7}, 2, 2);
cv::imwrite("lena_blur.png", similar_img);
cv::Mat hash_input, hash_similar;
algo->compute(input, hash_input);
algo->compute(similar_img, hash_similar);
std::cout<<"gaussian blur attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
//detect similar image after shift attack
similar_img.setTo(0);
input(cv::Rect(0,10, input.cols,input.rows-10)).
copyTo(similar_img(cv::Rect(0,0,input.cols,input.rows-10)));
cv::imwrite("lena_shift.png", similar_img);
algo->compute(similar_img, hash_similar);
std::cout<<"shift attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
//detect similar image after resize
cv::resize(input, similar_img, {120, 40});
cv::imwrite("lena_resize.png", similar_img);
algo->compute(similar_img, hash_similar);
std::cout<<"resize attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
}
int main()
{
using namespace cv::img_hash;
//disable opencl acceleration may(or may not) boost up speed of img_hash
cv::ocl::setUseOpenCL(false);
//if the value after compare <= 8, that means the images
//very similar to each other
compute(ColorMomentHash::create());
//there are other algorithms you can try out
//every algorithms have their pros and cons
compute(AverageHash::create());
compute(PHash::create());
compute(MarrHildrethHash::create());
compute(RadialVarianceHash::create());
//BlockMeanHash support mode 0 and mode 1, they associate to
//mode 1 and mode 2 of PHash library
compute(BlockMeanHash::create(0));
compute(BlockMeanHash::create(1));
}
In this case, ColorMomentHash give us best result
gaussian blur attack : 0.567521
shift attack : 0.229728
resize attack : 0.229358
Pros and cons of each algorithm
The performance of img_hash is good too
Speed comparison with PHash library(100 images from ukbench)
If you want to know the recommend thresholds for these algorithms, please check this post(http://qtandopencv.blogspot.my/2016/06/introduction-to-image-hash-module-of.html).
If you are interesting about how do I measure the performance of img_hash modules(include speed and different attacks), please check this link(http://qtandopencv.blogspot.my/2016/06/speed-up-image-hashing-of-opencvimghash.html).
Does the screenshot contain only the icon? If so, the L2 distance of the two images might suffice. If the L2 distance doesn't work, the next step is to try something simple and well established, like: Lucas-Kanade. Which I'm sure is available in OpenCV.
If you want to get an index about the similarity of the two pictures, I suggest you from the metrics the SSIM index. It is more consistent with the human eye. Here is an article about it: Structural Similarity Index
It is implemented in OpenCV too, and it can be accelerated with GPU: OpenCV SSIM with GPU
If you can be sure to have precise alignment of your template (the icon) to the testing region, then any old sum of pixel differences will work.
If the alignment is only going to be a tiny bit off, then you can low-pass both images with cv::GaussianBlur before finding the sum of pixel differences.
If the quality of the alignment is potentially poor then I would recommend either a Histogram of Oriented Gradients or one of OpenCV's convenient keypoint detection/descriptor algorithms (such as SIFT or SURF).
If for matching identical images - code for L2 distance
// Compare two images by getting the L2 error (square-root of sum of squared error).
double getSimilarity( const Mat A, const Mat B ) {
if ( A.rows > 0 && A.rows == B.rows && A.cols > 0 && A.cols == B.cols ) {
// Calculate the L2 relative error between images.
double errorL2 = norm( A, B, CV_L2 );
// Convert to a reasonable scale, since L2 error is summed across all pixels of the image.
double similarity = errorL2 / (double)( A.rows * A.cols );
return similarity;
}
else {
//Images have a different size
return 100000000.0; // Return a bad value
}
Fast. But not robust to changes in lighting/viewpoint etc.
Source
If you want to compare image for similarity,I suggest you to used OpenCV. In OpenCV, there are few feature matching and template matching. For feature matching, there are SURF, SIFT, FAST and so on detector. You can use this to detect, describe and then match the image. After that, you can use the specific index to find number of match between the two images.
Hu invariant moments is very powerful tool to compare two images
Hash functions are used in the undouble library to detect (near-)identical images (disclaimer: I am also the author). This is a simple and fast way to compare two or more images for similarity. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images based on a threshold value.

Resources