Can we use the Euclidean distance to determine the similarity between two images
Detect Keypoint image1, image2 using SUFT
Compute Descriptor image1, image2 using SUFT
double dif = norm(des1,des2,L2_norm)----> if dif is small -> can we tell that two images similar?
If yes, so what is the threshold to lead to these two images are similar.
And another question is when I compute descriptor for those images.
I see that the size of them is not equal. Ex: des1 is 64x96, des2 is 64x85
So the norm function throw an exception.
So is there any way for us to normalize the descriptor to fixed size in order to using Euclidean distance or not?
And I use OpenCV Java API to do these above stuffs currently.
Related
Problem Formulation
Suppose I have several 10000*10000 grids (can be transformed to 10000*10000 grayscale images. I would regard image and grid as the same below), and at each grid-point, there is some value (in my case it's the number of copies of a specific gene expressed at that pixel location, note that the locations are same for every grid). What I want is to quantify the similarity between two 2D spatial point-patterns of this kind (i.e., the spatial expression patterns of two distinct genes), and rank all pairs of genes in a "most similar" to "most dissimilar" manner. Note that it is not the spatial pattern in terms of the absolute value of expression level that I care about, rather, it's the relative pattern that I care about. As a result, I might need to utilize some correlation instead of distance metrics when comparing corresponding pixels.
The easiest method might be directly viewing all pixels together as a vector and calculate some correlation metric between the two vectors. However, this does not take the spatial information into account. Those genes that I am most interested in have spatial patterns, i.e., clustering and autocorrelation effects their expression pattern (though their "cluster" might take a very thin shape rather than sticking together, e.g., genes specific to the skin cells), which means usually the image would have several peak local regions, while expression levels at other pixels would be extremely low (near 0).
Possible Directions
I am not exactly sure if I should (1) consider applying image similarity comparison algorithms from image processing that take local structure similarity into account (e.g., SSIM, SIFT, as outlined in Simple and fast method to compare images for similarity), or (2) consider applying spatial similarity comparison algorithms from spatial statistics in GIS (there are some papers about this, but I am not sure if there are some algorithms dealing with simple point data rather than the normal region data with shape (in a more GIS-sense way, I need to find an algorithm dealing with raster data rather than polygon data)), or (3) consider directly applying statistical methods that deal with discrete 2D distributions, which I think might be a bit crude (seems to disregard the regional clustering/autocorrelation effects, ~ Tobler's First Law of Geography).
For direction (1), I am thinking about a simple method, that is, first find some "peak" regions in the two images respectively and regard their union as ROIs, and then compare those ROIs in the two images specifically in a simple pixel-by-pixel way (regard them together as a vector), but I am not sure if I can replace the distance metrics with correlation metrics, and am a bit worried that many methods of similarity comparison in image processing might not work well when the two images are dissimilar. For direction (2), I think this direction might be more appropriate because this problem is indeed related to spatial statistics, but I do not yet know where to start in GIS. I guess direction (3) is somewhat masked by (2), so I might not consider it here.
Sample
Sample image: (There are some issues w/ my own data, so here I borrowed an image from SpatialLIBD http://research.libd.org/spatialLIBD/reference/sce_image_grid_gene.html)
Let's say the value at each pixel is discretely valued between 0 and 10 (could be scaled to [0,1] if needed). The shapes of tissues in the right and left subfigure are a bit different, but in my case they are exactly the same.
PS: There is one might-be-serious problem regarding spatial statistics though. The expression of certain marker genes of a specific cell type might not be clustered in a bulk, but in the shape of a thin layer or irregularly. For example, if the grid is a section of the brain, then the high-expression peak region for cortex layer-specific genes (e.g., Ctip2 for layer V) might form a thin arc curved layer in the 10000*10000 grid.
UPDATE: I found a method belonging to the (3) direction called "optimal transport" problem that might be useful. Looks like it integrates locality information into the comparison of distribution. Would try to test this way (seems to be the easiest to code among all three directions?) tomorrow.
Any thoughts would be greatly appreciated!
In the absence of any sample image, I am assuming that your problem is similar to texture-pattern recognition.
We can start with Local Binary Patterns (2002), or LBPs for short. Unlike previous (1973) texture features that compute a global representation of texture based on the Gray Level Co-occurrence Matrix, LBPs instead compute a local representation of texture by comparing each pixel with its surrounding neighborhood of pixels. For each pixel in the image, we select a neighborhood of size r (to handle variable neighborhood sizes) surrounding the center pixel. A LBP value is then calculated for this center pixel and stored in the output 2D array with the same width and height as the input image. Then you can calculate a histogram of LBP codes (as final feature vector) and apply machine learning for classifications.
LBP implementations can be found in both the scikit-image and OpenCV but latter's implementation is strictly in the context of face recognition — the underlying LBP extractor is not exposed for raw LBP histogram computation. The scikit-image implementation of LBPs offer more control of the types of LBP histograms you want to generate. Furthermore, the scikit-image implementation also includes variants of LBPs that improve rotation and grayscale invariance.
Some starter code:
from skimage import feature
import numpy as np
from sklearn.svm import LinearSVC
from imutils import paths
import cv2
import os
class LocalBinaryPatterns:
def __init__(self, numPoints, radius):
# store the number of points and radius
self.numPoints = numPoints
self.radius = radius
def describe(self, image, eps=1e-7):
# compute the Local Binary Pattern representation
# of the image, and then use the LBP representation
# to build the histogram of patterns
lbp = feature.local_binary_pattern(image, self.numPoints,
self.radius, method="uniform")
(hist, _) = np.histogram(lbp.ravel(),
bins=np.arange(0, self.numPoints + 3),
range=(0, self.numPoints + 2))
# normalize the histogram
hist = hist.astype("float")
hist /= (hist.sum() + eps)
# return the histogram of Local Binary Patterns
return hist
# initialize the local binary patterns descriptor along with
# the data and label lists
desc = LocalBinaryPatterns(24, 8)
data = []
labels = []
# loop over the training images
for imagePath in paths.list_images(args["training"]):
# load the image, convert it to grayscale, and describe it
image = cv2.imread(imagePath)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
hist = desc.describe(gray)
# extract the label from the image path, then update the
# label and data lists
labels.append(imagePath.split(os.path.sep)[-2])
data.append(hist)
# train a Linear SVM on the data
model = LinearSVC(C=100.0, random_state=42)
model.fit(data, labels)
Once our Linear SVM is trained, we can use it to classify subsequent texture images:
# loop over the testing images
for imagePath in paths.list_images(args["testing"]):
# load the image, convert it to grayscale, describe it,
# and classify it
image = cv2.imread(imagePath)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
hist = desc.describe(gray)
prediction = model.predict(hist.reshape(1, -1))
# display the image and the prediction
cv2.putText(image, prediction[0], (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
1.0, (0, 0, 255), 3)
cv2.imshow("Image", image)
cv2.waitKey(0)
Have a look at this excellent tutorial for more details.
Ravi Kumar (2016) was able to extract more finely textured images by combining LBP with Gabor filters to filter the coefficients of LBP pattern
I'm training a HOG + SVM model, and my training data comes in various sizes and aspect ratios. The SVM model can't be trained on variable sized lists, so I'm looking to calculate a histogram of gradients that is the same length regardless of image size.
Is there a clever way to do that? Or is it better to resize the images or pad them?
What people usually do in such case is one of the follow two things:
Resize all images (or image patches) to a fixed size and extract the HOG features from those.
Use the "Bag of Words/Features" method and don't resize the images.
The first method 1. is quite simple but it has some problems which method 2. tries to solve.
First, think of what a hog descriptor does. It divides an image into cells of a fixed length, calculates the gradients cell-wise to generate cell-wise histograms(based on voting). At the end, you'll have a concatenated histogram of all the cells and that's your descriptor.
So there is a problem with it, because the object (that you want to detect) has to cover the images in similar manner. Otherwise your descriptor would look different depending on the location of the object inside the image.
Method 2. works as follows:
Extract the HOG features from both positive and negative images in your training set.
Use an clustering algorithm like k-means to define a fixed amount of k centroids.
For each image in your dataset, extract the HOG features and compare them element-wise to the centroids to create a frequency histogram.
Use the frequency histograms for the training of your SVM and use it for the classification phase. This way, the location doesn't matter and you'll always have a fixed sized of inputs. You'll also benefit from the reduction of dimensions.
You can normalize the images to a given target shape using cv2.resize(), divide image into number of blocks you want and calculate the histogram of orientations along with the magnitudes. Below is a simple implementation of the same.
img = cv2.imread(filename,0)
img = cv2.resize(img,(16,16)) #resize the image
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0) #horizontal gradinets
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1) # vertical gradients
mag, ang = cv2.cartToPolar(gx, gy)
bin_n = 16 # Number of bins
# quantizing binvalues in (0-16)
bins = np.int32(bin_n*ang/(2*np.pi))
# divide to 4 sub-squares
s = 8 #block size
bin_cells = bins[:s,:s],bins[s:,:s],bins[:s,s:],bins[s:,s:]
mag_cells = mag[:s,:s], mag[s:,:s], mag[:s,s:], mag[s:,s:]
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells,mag_cells)]
hist = np.hstack(hists) #histogram feature data to be fed to SVM model
Hope that helps!
I want to implement an OpenCV version of VL_PHOW() (matlab src code) from VLFeat. In few words, it's dense SIFT with multiple scales (increasing SIFT descriptor bin size) to make it scale invariant.
However, the authors suggests to apply a Gaussian kernel to improve the results. In paritcular, the Magnif parameter describe it:
Magnif 6 The image is smoothed by a Gaussian kernel of standard
deviation SIZE / MAGNIF. Note that, in the standard SIFT descriptor,
the magnification value is 3; here the default one is 6 as it seems to
perform better in applications.
And this is the relevant matlab code:
% smooth the image to the appropriate scale based on the size
% of the SIFT bins
sigma = opts.sizes(si) / opts.magnif ;
ims = vl_imsmooth(im, sigma) ;
My question is: how can I implement this in OpenCV? The equivalent function in OpenCV seems to be GaussianBlur, but I can't figure out how to represent the code above in terms of this function.
In my opencv project I try to compute sift features, I detect keypoints and compute descriptors. What I know it should contains 4 parameter: X, Y, Scale and Orientation. The opencv keypoint structure has pt (X, Y coordination) and angle (Orientation), but I could not understand where is the Scale parameter! can you explain me about this parameter?
the scale use for building pyramid which mean you can choose how much change of the size(scale) of the objects can be change.
for example if the object(s) moving in distance you should have more levels of pyramid in order to have better recognition.
I am having some problem with Mat in OpenCV. I was using SIFT for image classification with SVM. Now I realized that the true-positive rate was low, so I decided to add ORB feature detectors on top of SIFT. My problem is that, for example for one image:
SIFT descriptors: Mat size [128 x 250]
ORB descriptors: Mat size [32 x 400]
Now as for training matrix all the features have to be in the training matrix and than trained. Now, as you see that the 2 matrix of SIFT and ORB are of different size. How can I combine them into one matrix?.
Do I have to append (add) the second matrix to the end of the first one because currently I am assigning it to separate columns.
Please give me some hints on this please.
There are two parts to extracting features based on your solution. The first part is to detect keypoints, and the second part is to describe them. At the moment, you are doing both stages with both SIFT and ORB, and coming up with matrices of different sizes. Instead, use the following framework:
// Construct detectors
cv::FeatureDetector siftDetector, orbDetector;
siftDetector.create("SIFT");
orbDetector.create("ORB");
// Detect keypoints
std::vector<cv::Keypoint> siftPoints, orbPoints;
siftDetector.detect(img, siftPoints);
orbDetector.detect(img, orbPoints);
// Concantenate the vectors
siftPoints.insert(siftPoints.end(), orbPoints.begin(), orbPoints.end());
// Construct descriptor (SIFT used as example)
cv::FeatureDescriptor siftDescriptor;
siftDescriptor.create("SIFT");
// Compute descriptors
cv::Mat descriptors;
siftDescriptor.compute(img, siftPoints, descriptors);
You now have SIFT descriptions for all detected keypoints.
PS: I haven't compiled this code, so double check for typos and syntax.