SIFT clustering converting sift features (128 dimensional vector) into a vocabulary - opencv

How to cluster the extracted SIFT descriptors. The aim of doing clustering is to use it for classification purpose.

Approach:
First of all compute the SIFT descriptor for each image/object and then push_back that descriptor into a single image (lets called that image Mat featuresUnclustered).
After that your task is to cluster all the descriptor into some number of groups/clusters (which is decided by you). That will be the size of your vocabulary/dictionary.
int dictionarySize=200;
And then finally comes the step of clustering them
//define Term Criteria
TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
//retries number
int retries=1;
//necessary flags
int flags=KMEANS_PP_CENTERS;
//Create the BoW (or BoF) trainer
BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
//cluster the feature vectors
Mat dictionary=bowTrainer.cluster(featuresUnclustered);

To cluster , convert N*128 dimension(N is the number of descriptor from each image) into a array of M*128 dimension (M number of descriptor from all images). and perform cluster on this data.
eg:
def dict2numpy(dict):
nkeys = len(dict)
array = zeros((nkeys * PRE_ALLOCATION_BUFFER, 128))
pivot = 0
for key in dict.keys():
value = dict[key]
nelements = value.shape[0]
while pivot + nelements > array.shape[0]:
padding = zeros_like(array)
array = vstack((array, padding))
array[pivot:pivot + nelements] = value
pivot += nelements
array = resize(array, (pivot, 128))
return array
all_features_array = dict2numpy(all_features)
nfeatures = all_features_array.shape[0]
nclusters = 100
codebook, distortion = vq.kmeans(all_features_array,
nclusters)

usually kmeans is applied to get k centers, you can change each image into a vector of the K (each dimension represent how many patch in that cluster).

Related

Euclidean distance between RGB histogram of two images

I have two pictures with histogram of the R,G,B intensities for each image. I am suppose to find the euclidean distance using the values of histogram to find the similarity.
I know euclidean distance formula is:
= sqr((R1-R2)^2 +(G1-G2)^2+(B1-B2)^2)
Since the histogram of R G and B for each image has several values, so are you suppose to take the average of all the intensity values in one histogram and then subtract it with the average of intensity values of the other histogram?
Example 1:
Image1: R1 histogram has values of 2,3,4
Image2: R2 histogram has values of 2,3,1
Then do I do R1=(2+3+4)/3 ,R2=(2+3+1)/3
Then do I do (9-6)^2 for the value (R1-R2)^2 in sqr((R1-R2)^2+(G1-G2)^2+(B1-B2)^2)?
OR
Example 2:
Image1: R1 histogram has values of 2,3,4
Image2: R2 histogram has values of 2,3,1
Then do I do this (2-2)^2 +(3-3)^2 +(4-1)^2 for the (R1-R2)^2 in sqr((R1-R2)^2 +(G1-G2)^2+(B1-B2)^2)?
Please help me out, thanks!
Think of a histogram as a vector (maybe there are 256 bins, so it’s a 256-dimensional vector). Now compute the Euclidean distance between the two vectors:
DR = norm(R1-R2); % same as sqrt(sum((R1-R2).^2))
You can repeat this for each R, G and B component, and combine the three distances again using the Euclidean norm:
D = sqrt(DR.^2 + DG.^2 + DB.^2);
This is the same as concatenating the 3 color histograms for each image and computing their distance:
H1 = [R1,G1,B1]; % assuming histograms are row vectors
H2 = [R2,G2,B2];
D = norm(H1-H2);
I think you are mixing Normalization with Euclidean Distance.
Euclidean Distance = Sqrt( Sum( ( a[i][j] - b[i][j] )^2 ) ) for all i = 0..width, j = 0..height
a[][] and b[][] can be normalized data or non-normalized data. If you are using the raw image pixel values, they are non-normalized. You can normalize the images by dividing by the intensity range of the pixel values (min-max normalization).
So, compute the normalized images anorm[][] and bnorm[][] in the first pass where,
for(i = 0; i < width; i++) {
for(j = 0; j < height; j++) {
anorm[i][j] = a[i][j] / (max_a - min_a);
bnorm[i][j] = b[i][j] / (max_b - min_b);
}
}
Now, apply the Euclidean Distance formula on anorm[][] and bnorm[][].

OpenCV Gaussian Mixture Model of histogram

Given a histogram I want to train a Gaussian Mixture Model:
int calcGMMThreshold(cv::Mat & hist, cv::Mat & labels){
cv::Mat samples(hist.rows,2, CV_32FC1); // for building 2 dim samples
// output variables
cv::Mat probs, log_likelihoods;
// building 2 dimensional Mat -->[value][#value]
for(int i = 0; i<hist.rows; i++)
{
samples.at<float>(i,0) = (float)i;
samples.at<float>(i,1) = hist.at<float>(i);
}
assert(samples.cols == 2);
assert(samples.rows == 256);
///set up gmm
//gmm object with 3 gmms
cv::EM gmm(3);
/*train gmms*/
gmm.train(samples, log_likelihoods, labels, probs);
}
When I plot the histogram with the labels for me it looks like that my gmms separate the absolute values and not the 2 dimensional input.
I would have expected 3 Gaussians with their means at each peak of the histogram.
To compute a gaussian mixture model use the actual image data not the histogram as intended in the code above.

Image similarity (histogram matching/euclidean distance)

I have been searching this for days but i can't seem to know where to start. I am trying to compare a base image with 10 other images by their color and i am bounded to use either euclidean or histogram matching without using opencv functions. I have only tried euclidean distance. What i want to do is get the distance of the each pixel in image1 and image2. I displayed the distances and i am getting very high values. What could be wrong here in my code? Please help. :)
for(p=0;p<height;p++) // row
{
for(p2=0;p2<inputHeight;p2++) // row
{
for(u2=0;u2<inputWidth;u2++) // col
{
r2 = inputData[p2*inputStep+u2*inputChannels+2];
g2 = inputData[p2*inputStep+u2*inputChannels+1];
b2 = inputData[p2*inputStep+u2*inputChannels+0];
}
}
for(p=0;p<height;p++) // row
{
for(u=0;u<width;u++) // col
{
r = data[p*step+u*channels+2];
g = data[p*step+u*channels+1];
b = data[p*step+u*channels+0];
}
}
euclidean=(euclidean+sqrt(pow(b2-b,2) + pow(g2-g, 2) + pow(r2-r,2)));
}
Your program tends to get very high value as you summed all pixels' Euclidean distance together:
euclidean=(euclidean+sqrt(pow(b2-b,2) + pow(g2-g, 2) + pow(r2-r,2)));
I suggest you to do as follows:
Compute color histogram (vector features) of the images.
Compute correlation coefficient between these histograms as the differences of the images.

OpenCV Mat per-element operation: vector-matrix multiplication

I is an mxn matrix and each element of I is a 1x3 vector (I is a 3-channel Mat image actually).
M is a 3x3 matrix.
J is an matrix having the same dimension as I and is computed as follows: each element of J is the vector-matrix product of the corresponding (i.e. having the same coordinates) element of I and M.
I.e. if v1(r1,g1,b1) is an element of I and v2(r2,g2,b2) is its corresponding element of J, then v2 = v1 * M (this is a vector-matrix product, not a per-element product).
Question: How to compute J efficiently (in terms of speed)?
Thank you for your help.
As far as I know, the most efficient way to implement such an operation is as follows:
Reshape I from mxnx3 to (m·n)x3, let's call it I'
Calculate J' = I' * M
Reshape J' from (m·n)x3 to mxnx3, this is the J we wanted
The idea is to stack each pixel-wise operation pi'·M into one single operation P'·M, where P is the 3x(m·n) matrix containing each pixel in columns (hence P' holds one pixel per row. It's just a convention, really).
Here is a code sample written in c++:
// read some image
cv::Mat I = cv::imread("image.png"); // rows x cols x 3
// some matrix M, that modifies each pixel
cv::Mat M = (cv::Mat_<float>(3, 3) << 0, 0, 0,
0, .5, 0,
0, 0, .5); // 3 x 3
// remember old dimension
uint8_t prevChannels = I.channels;
uint32_t prevRows = I.rows;
// reshape I
uint32_t newRows = I.rows * I.cols;
I = I.reshape(1, newRows); // (rows * cols) x 3
// compute J
cv::Mat J = I * M; // (rows * cols) x 3
// reshape to original dimensions
J = J.reshape(prevChannels, prevRows); // rows x cols x 3
OpenCV provides an O(1) reshaping operation.
Thus performance depends solely on matrix multiplication, which I expect to be as efficient as possible in a computer vision library.
To further enhance performance, you might want to take a look at matrix multiplication using the ocl and gpu modules.

Subtracting a fixed value from cv::Mat objects in OpenCV

I am fairly new to OpenCV and sort of understanding it bit by bit. I know that the matrix operators in cv::Mat class has been overloaded to do A.mult(B), A+B, A-B, A/B, etc.
I have two vectors which are projections of rows and columns of an image. I have two images(S and T), so each of them will have two projection vectors (rowProejctionS, columnProjectionS, rowProjectionT, columnProjectionT). I also have the means of the images (meanS, meanT). I need to do a "SUM OF PRODUCT" related calculation, which in MATLAB is as follows
numeratorLambdaRo = sum((rowProjectionT - meanT).*(rowProjectionS - meanS));
denominatorLambdaRo = sqrt(sum((rowProjectionT - meanT).^2)*sum((rowProjectionS - meanS).^2);
LambaRo = numeratorLambdaRo/denominatorLambdaRo;
I am not entirely sure about the capability of matrix operators in the context of cv::Mat objects.
declare meanT, meanS as double or cv::Scalar and you can just substract it from your matrix. You can maybe split your operations :
rowProjectionT -= meanT;
rowProjectionS -= meanS;
numeratoLambdaRo = cv::sum(rowProjectionT*rowProjectionS.t()); // transpose 1 of the vector so that multiplication is equivalent to dot product.
cv::Mat rowProjTSquare = rowProjectionT*rowProjectionT.t();
cv::Mat rowProjSSquare = rowProjectionS*rowProjectionS.t();
denominatorLambdaRo = sqrt(cv::sum(rowProjTSquare*rowProjSSquare));

Resources