Difference between pHash nearest neighbor and color histogram nearest neighbor? - histogram

I'm looking to create an animated series of photos that are as similar as possible. While researching, I've come across two methods:
Generate a pHash of the images and do a nearest neighbor using the Hamming distance of the hash.
Create color histograms and do an n-dimensional nearest neighbor using Euclidian distance.
Many people who have been commenting for the https://stackoverflow.com/questions/6971966/how-to-measure-percentage-similarity-between-two-images question claim that the two processes are essentially the same. I'm looking for a little more insight on this. They seem like different processes.
Thoughts?

Related

How to 3d reconstruct robustly from multiple images with known poses in OpenCV

The traditional solution for high resolution images examples :
extract features (dense) for all images
match features to find tracks through images
triangulate features to 3d points.
I can give two problem here for my case (many 640*480 images with small movements between each others) , first: matching is very slow , especially if the number of images is big, so a better solution can be optical flow tracking.., but it's getting sparse with big moves, ( a mix could solve the problem !!)
second: triangulate tracks , though it is over-determined problem, I find it hard to code a solution, .. (here am asking for simplifying what I read in references )
I searched quite a bit for libraries in that direction, with no useful result.
again, I have ground truth camera matrices and need only 3d positions as first estimate (without BA),
A coded software solution can be of great help as I don't need to reinvent the wheel, though a detailed instructions maybe helpful
this basically shows the underlying geometry for estimating the depth.
As you said, we have camera pose Q, and we are picking a point X from world, X_L is it's projection on left image, now, with Q_L, Q_R and X_L, we are able to make up this green colored epipolar plane, the rest job is easy, we search through points on line (Q_L, X), this line exactly describe the depth of X_L, with different assumptions: X1, X2,..., we can get different projections on the right image
Now we compare the pixel intensity difference from X_L and the reprojected point on right image, just pick the smallest one and that corresponding depth is exactly what we want.
Pretty easy hey? Truth is it's way harder, image is never strictly convex:
This makes our matching extremely hard, since the non-convex function will result any distance function have multiple critical points (candidate matches), how do you decide which one is the correct one?
However, people proposed path based match to handle this problem, methods like: SAD, SSD, NCC, they are introduced to create the distance function as convex as possible, still, they are unable to handle large scale repeated texture problem and low texture problem.
To solve this, people start to search over a long range in the epipolar line, and suddenly found that we can describe this whole distribution of matching metrics into a distance along the depth.
The horizontal axis is depth, and the vertical axis is matching metric score, and this illustration lead us found the depth filter, and we usually describe this distribution with gaussian, aka, gaussian depth filter, and use this filter to discribe the uncertainty of depth, combined with the patch matching method, we can roughly get a proposal.
Now what, let's use some optimization tools, like GN or gradient descent to finally refine the depth estimaiton.
To sum up, the total process of the depth estimation is like the following steps:
assume all depth in all pixel following a initial gaussian distribution
start search through epipolar line and reproject points into target frame
triangulate depth and calculate the uncertainty of the depth from depth filter
run 2 and 3 again to get a new depth distribution and merge with previous one, if they converged then break, ortherwise start again from 2.

Kmeans clustering on different distance function in Lab space

Problem:
To cluster the similar colour pixels in CIE LAB using K means.
I want to use CIE 94 for distance between 2 pixels
Formula of CIE94
What i read was Kmeans work in "Euclidean space" where the positional cordinates are minimised by cost function which is (sum of squared difference)
The reason of not Using kmeans in space other than euclidean is
"""algorithm is often presented as assigning objects to the nearest cluster by distance. The standard algorithm aims at minimizing the within-cluster sum of squares (WCSS) objective, and thus assigns by "least sum of squares", which is exactly equivalent to assigning by the smallest Euclidean distance. Using a different distance function other than (squared) Euclidean distance may stop the algorithm from converging""(source wiki)
So how to use distance CIE 94 in LAB SPACE for similar colour clustering ?
So how to approach the problem ? What should be the minimisation function here ? HOW to map euclidean space to lab space if for the k mean euclidean formula to work ? Any other approach here ?
The reason that CIE LAB is often used for clustering is because it reduces the color to 2 dimensions (as opposed to RGB with 3 color channels). You can easily think of the color for each pixel in a Cartesian coordinate system, instead of points (x,y) you have points (a,b) From here you simply perform a 2d kmeans.
Exactly how you implement kmeans is up to you. The nice thing about reducing colors to a 2d space is we can imagine the data on a grid, and now we can use any regular distance measure we want. Mahalonobis, euclidean, 1 norm, city block, etc. The possibilities are really endless here.
You don't have to use CIELAB, you can just as easily use YCbCr, YUV, or any other colorspace that represents color in 2 dimensions. IF you wanted to try a 3d kmeans you could use rgb, hsv, etc. One problem with higher dimensionality is sparsity of clusters (large variance) and most importantly, increased computation time.
Just for fun I've included two images clustered using kmeans, one in LAB and one in YCbCr, you can see the clustering is nearly identical (except that the labels are different), just proving that the exact color space is irrelevant, the main point is to match the dimensionality of your kmeans with that of your data
EDIT
You made some good points in your comments. I was merely demonstrating that by abstracting the problem you can imagine many variations for the same basic clustering algorithm. But you are right, there are advantages to using CIELAB
Back to the distance measure. Kmeans has two steps, assignment, and update (it is very similar to the Expectation Maximization algorithm). This distance is used in assignment step of k-means. Here is some psuedo code
for each pixel 1 to rows*cols
for each cluster 1 to k
dist[k] = calculate_distance(pixel, mu[k])
pixel_id = index k of minimum dist
you would create a function calculate_distance that uses the delta_e calculation from cielab94. This formula uses all 3 channels to calculate distance. Hopefully this answers your questions
NOTE
My examples only use the 2 color channels, ignoring the luminance channel. I used this technique since often the goal is group colors despite lighting disparities(such as shadows). The delta_E measure is not lighting invariant. This may or may not be a concern for your application, but it is something to keep in mind.
results using square euclidean distance
results using cityblock distance
There are k-means variations for other distance functions.
In particular k-medoids (PAM) works with arbitrary distance functions.

Comparing histograms with chi-squared distance

Different papers/libraries seem to have a different way of computing the chi squared distance, for instance in OpenCV it's expressed in one way while in this paper it's expressed in a different manner.
My first question is, what's the difference between the two formulas, i.e. why in one formula we divide by the value of one bin while in the other we divide by the sum of the two bins?
Secondly, should the histograms be normalized, if so why? The chi-squared statistic doesn't require that but the general consensus is to normalize the histogram before using a chi-squared distance.
The documentation is wrong. The implementation is correct inside OpenCV. Take a look at this bug post.
Also, normalising a histogram does not really change its pattern or "shape". Only the scale is brought down. So as long as you're working independent of scale, which you probably are if you're looking at how much one histogram "resembles" another, normalising should only make calculations faster (hopefully).

Find input image (ID,passport) in imagesDB based on similarity

I would like to decide if an image is present in a list stored in a DB (e.g. pictures of IDs, passport, Stu. card, etc). I thought about using a KNN algorithm, that will plot the K closest images.
Options for distance metric:
sum of Euclidean distance between each relative pixels (img1[pixel_i], img2[pixel_i])
sum of Euclidean distance betwen each pixel to each other, multiplied by some factor decreasing with distance (pixel to pixel)
same as above, but with manhattan...
Do you know/think of a better way to deal with the image similarity subject?
I think that using raw graylevel values in computing distances is a very bad idea. This is not invariant to illumination, to translation and to rotation (although I don't think that rotation is a big issue in face images).
Try to use some robust and invariant descriptor extracted from each image (e.g. SIFT on keypoints) and then compute distances between those features. K-NN could work. Alternatively, look for image retrieval literature for more advanced approaches.
Hope this helps!
If you have a large number of images in your database, it will get rather unwieldy calculating the similarity between a given image and every single image in your database every time. Instead, I would consider something like a Perceptual Hash (pHash) where you could pre-compute a parameter ONCE for each image in your database and store it, and then , when you want to compare an image you calculate just its single pHash and compare that with all the stored ones in your database.

How to match texture similarity in images?

What are the ways in which to quantify the texture of a portion of an image? I'm trying to detect areas that are similar in texture in an image, sort of a measure of "how closely similar are they?"
So the question is what information about the image (edge, pixel value, gradient etc.) can be taken as containing its texture information.
Please note that this is not based on template matching.
Wikipedia didn't give much details on actually implementing any of the texture analyses.
Do you want to find two distinct areas in the image that looks the same (same texture) or match a texture in one image to another?
The second is harder due to different radiometry.
Here is a basic scheme of how to measure similarity of areas.
You write a function which as input gets an area in the image and calculates scalar value. Like average brightness. This scalar is called a feature
You write more such functions to obtain about 8 - 30 features. which form together a vector which encodes information about the area in the image
Calculate such vector to both areas that you want to compare
Define similarity function which takes two vectors and output how much they are alike.
You need to focus on steps 2 and 4.
Step 2.: Use the following features: std() of brightness, some kind of corner detector, entropy filter, histogram of edges orientation, histogram of FFT frequencies (x and y directions). Use color information if available.
Step 4. You can use cosine simmilarity, min-max or weighted cosine.
After you implement about 4-6 such features and a similarity function start to run tests. Look at the results and try to understand why or where it doesnt work. Then add a specific feature to cover that topic.
For example if you see that texture with big blobs is regarded as simmilar to texture with tiny blobs then add morphological filter calculated densitiy of objects with size > 20sq pixels.
Iterate the process of identifying problem-design specific feature about 5 times and you will start to get very good results.
I'd suggest to use wavelet analysis. Wavelets are localized in both time and frequency and give a better signal representation using multiresolution analysis than FT does.
Thre is a paper explaining a wavelete approach for texture description. There is also a comparison method.
You might need to slightly modify an algorithm to process images of arbitrary shape.
An interesting approach for this, is to use the Local Binary Patterns.
Here is an basic example and some explanations : http://hanzratech.in/2015/05/30/local-binary-patterns.html
See that method as one of the many different ways to get features from your pictures. It corresponds to the 2nd step of DanielHsH's method.

Resources