OpenCV Matrix of different size - opencv

I am having some problem with Mat in OpenCV. I was using SIFT for image classification with SVM. Now I realized that the true-positive rate was low, so I decided to add ORB feature detectors on top of SIFT. My problem is that, for example for one image:
SIFT descriptors: Mat size [128 x 250]
ORB descriptors: Mat size [32 x 400]
Now as for training matrix all the features have to be in the training matrix and than trained. Now, as you see that the 2 matrix of SIFT and ORB are of different size. How can I combine them into one matrix?.
Do I have to append (add) the second matrix to the end of the first one because currently I am assigning it to separate columns.
Please give me some hints on this please.

There are two parts to extracting features based on your solution. The first part is to detect keypoints, and the second part is to describe them. At the moment, you are doing both stages with both SIFT and ORB, and coming up with matrices of different sizes. Instead, use the following framework:
// Construct detectors
cv::FeatureDetector siftDetector, orbDetector;
siftDetector.create("SIFT");
orbDetector.create("ORB");
// Detect keypoints
std::vector<cv::Keypoint> siftPoints, orbPoints;
siftDetector.detect(img, siftPoints);
orbDetector.detect(img, orbPoints);
// Concantenate the vectors
siftPoints.insert(siftPoints.end(), orbPoints.begin(), orbPoints.end());
// Construct descriptor (SIFT used as example)
cv::FeatureDescriptor siftDescriptor;
siftDescriptor.create("SIFT");
// Compute descriptors
cv::Mat descriptors;
siftDescriptor.compute(img, siftPoints, descriptors);
You now have SIFT descriptions for all detected keypoints.
PS: I haven't compiled this code, so double check for typos and syntax.

Related

How do I quantify the similarity between spatial patterns

Problem Formulation
Suppose I have several 10000*10000 grids (can be transformed to 10000*10000 grayscale images. I would regard image and grid as the same below), and at each grid-point, there is some value (in my case it's the number of copies of a specific gene expressed at that pixel location, note that the locations are same for every grid). What I want is to quantify the similarity between two 2D spatial point-patterns of this kind (i.e., the spatial expression patterns of two distinct genes), and rank all pairs of genes in a "most similar" to "most dissimilar" manner. Note that it is not the spatial pattern in terms of the absolute value of expression level that I care about, rather, it's the relative pattern that I care about. As a result, I might need to utilize some correlation instead of distance metrics when comparing corresponding pixels.
The easiest method might be directly viewing all pixels together as a vector and calculate some correlation metric between the two vectors. However, this does not take the spatial information into account. Those genes that I am most interested in have spatial patterns, i.e., clustering and autocorrelation effects their expression pattern (though their "cluster" might take a very thin shape rather than sticking together, e.g., genes specific to the skin cells), which means usually the image would have several peak local regions, while expression levels at other pixels would be extremely low (near 0).
Possible Directions
I am not exactly sure if I should (1) consider applying image similarity comparison algorithms from image processing that take local structure similarity into account (e.g., SSIM, SIFT, as outlined in Simple and fast method to compare images for similarity), or (2) consider applying spatial similarity comparison algorithms from spatial statistics in GIS (there are some papers about this, but I am not sure if there are some algorithms dealing with simple point data rather than the normal region data with shape (in a more GIS-sense way, I need to find an algorithm dealing with raster data rather than polygon data)), or (3) consider directly applying statistical methods that deal with discrete 2D distributions, which I think might be a bit crude (seems to disregard the regional clustering/autocorrelation effects, ~ Tobler's First Law of Geography).
For direction (1), I am thinking about a simple method, that is, first find some "peak" regions in the two images respectively and regard their union as ROIs, and then compare those ROIs in the two images specifically in a simple pixel-by-pixel way (regard them together as a vector), but I am not sure if I can replace the distance metrics with correlation metrics, and am a bit worried that many methods of similarity comparison in image processing might not work well when the two images are dissimilar. For direction (2), I think this direction might be more appropriate because this problem is indeed related to spatial statistics, but I do not yet know where to start in GIS. I guess direction (3) is somewhat masked by (2), so I might not consider it here.
Sample
Sample image: (There are some issues w/ my own data, so here I borrowed an image from SpatialLIBD http://research.libd.org/spatialLIBD/reference/sce_image_grid_gene.html)
Let's say the value at each pixel is discretely valued between 0 and 10 (could be scaled to [0,1] if needed). The shapes of tissues in the right and left subfigure are a bit different, but in my case they are exactly the same.
PS: There is one might-be-serious problem regarding spatial statistics though. The expression of certain marker genes of a specific cell type might not be clustered in a bulk, but in the shape of a thin layer or irregularly. For example, if the grid is a section of the brain, then the high-expression peak region for cortex layer-specific genes (e.g., Ctip2 for layer V) might form a thin arc curved layer in the 10000*10000 grid.
UPDATE: I found a method belonging to the (3) direction called "optimal transport" problem that might be useful. Looks like it integrates locality information into the comparison of distribution. Would try to test this way (seems to be the easiest to code among all three directions?) tomorrow.
Any thoughts would be greatly appreciated!
In the absence of any sample image, I am assuming that your problem is similar to texture-pattern recognition.
We can start with Local Binary Patterns (2002), or LBPs for short. Unlike previous (1973) texture features that compute a global representation of texture based on the Gray Level Co-occurrence Matrix, LBPs instead compute a local representation of texture by comparing each pixel with its surrounding neighborhood of pixels. For each pixel in the image, we select a neighborhood of size r (to handle variable neighborhood sizes) surrounding the center pixel. A LBP value is then calculated for this center pixel and stored in the output 2D array with the same width and height as the input image. Then you can calculate a histogram of LBP codes (as final feature vector) and apply machine learning for classifications.
LBP implementations can be found in both the scikit-image and OpenCV but latter's implementation is strictly in the context of face recognition — the underlying LBP extractor is not exposed for raw LBP histogram computation. The scikit-image implementation of LBPs offer more control of the types of LBP histograms you want to generate. Furthermore, the scikit-image implementation also includes variants of LBPs that improve rotation and grayscale invariance.
Some starter code:
from skimage import feature
import numpy as np
from sklearn.svm import LinearSVC
from imutils import paths
import cv2
import os
class LocalBinaryPatterns:
def __init__(self, numPoints, radius):
# store the number of points and radius
self.numPoints = numPoints
self.radius = radius
def describe(self, image, eps=1e-7):
# compute the Local Binary Pattern representation
# of the image, and then use the LBP representation
# to build the histogram of patterns
lbp = feature.local_binary_pattern(image, self.numPoints,
self.radius, method="uniform")
(hist, _) = np.histogram(lbp.ravel(),
bins=np.arange(0, self.numPoints + 3),
range=(0, self.numPoints + 2))
# normalize the histogram
hist = hist.astype("float")
hist /= (hist.sum() + eps)
# return the histogram of Local Binary Patterns
return hist
# initialize the local binary patterns descriptor along with
# the data and label lists
desc = LocalBinaryPatterns(24, 8)
data = []
labels = []
# loop over the training images
for imagePath in paths.list_images(args["training"]):
# load the image, convert it to grayscale, and describe it
image = cv2.imread(imagePath)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
hist = desc.describe(gray)
# extract the label from the image path, then update the
# label and data lists
labels.append(imagePath.split(os.path.sep)[-2])
data.append(hist)
# train a Linear SVM on the data
model = LinearSVC(C=100.0, random_state=42)
model.fit(data, labels)
Once our Linear SVM is trained, we can use it to classify subsequent texture images:
# loop over the testing images
for imagePath in paths.list_images(args["testing"]):
# load the image, convert it to grayscale, describe it,
# and classify it
image = cv2.imread(imagePath)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
hist = desc.describe(gray)
prediction = model.predict(hist.reshape(1, -1))
# display the image and the prediction
cv2.putText(image, prediction[0], (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
1.0, (0, 0, 255), 3)
cv2.imshow("Image", image)
cv2.waitKey(0)
Have a look at this excellent tutorial for more details.
Ravi Kumar (2016) was able to extract more finely textured images by combining LBP with Gabor filters to filter the coefficients of LBP pattern

How are the FCN heads convolved over RetinaNet's FPN features?

I've recently read the RetinaNet paper and I have yet to understood one minor detail:
We have the multi-scale feature maps obtained from the FPN (P2,...P7).
Then the two FCN heads (the classifier head and regessor head) are convolving each one of the feature maps.
However, each feature map has different spatial scale, so, how does the classifier head and regressor head maintain fixed output volumes, given all their convolution parameters are fix? (i.e. 3x3 filter with stride 1, etc).
Looking at this line at PyTorch's implementation of RetinaNet, I see the heads just convolve each feature and then all features are stacked somehow (the only common dimension between them is the Channel dimension which is 256, but spatially they are double from each other).
Would love to hear how are they combined, I wasn't able to understand that point.
After the convolution at each pyramid step, you reshape the outputs to be of shape (H*W, out_dim) (with out_dim being num_classes * num_anchors for the class head and 4 * num_anchors for the bbox regressor).
Finally, you can concatenate the resulting tensors along the H*W dimension, which is now possible because all the other dimensions match, and compute losses as you would on a network with a single feature layer.

Gaussian kernel in OpenCV to generate multiple scales

I want to implement an OpenCV version of VL_PHOW() (matlab src code) from VLFeat. In few words, it's dense SIFT with multiple scales (increasing SIFT descriptor bin size) to make it scale invariant.
However, the authors suggests to apply a Gaussian kernel to improve the results. In paritcular, the Magnif parameter describe it:
Magnif 6 The image is smoothed by a Gaussian kernel of standard
deviation SIZE / MAGNIF. Note that, in the standard SIFT descriptor,
the magnification value is 3; here the default one is 6 as it seems to
perform better in applications.
And this is the relevant matlab code:
% smooth the image to the appropriate scale based on the size
% of the SIFT bins
sigma = opts.sizes(si) / opts.magnif ;
ims = vl_imsmooth(im, sigma) ;
My question is: how can I implement this in OpenCV? The equivalent function in OpenCV seems to be GaussianBlur, but I can't figure out how to represent the code above in terms of this function.

OpenCV similarity between two images using Euclidean distance

Can we use the Euclidean distance to determine the similarity between two images
Detect Keypoint image1, image2 using SUFT
Compute Descriptor image1, image2 using SUFT
double dif = norm(des1,des2,L2_norm)----> if dif is small -> can we tell that two images similar?
If yes, so what is the threshold to lead to these two images are similar.
And another question is when I compute descriptor for those images.
I see that the size of them is not equal. Ex: des1 is 64x96, des2 is 64x85
So the norm function throw an exception.
So is there any way for us to normalize the descriptor to fixed size in order to using Euclidean distance or not?
And I use OpenCV Java API to do these above stuffs currently.

gaussian blur with FFT

im trying to implement a gaussian blur with the use of FFT and could find here the following recipe.
This means that you can take the
Fourier transform of the image and the
filter, multiply the (complex)
results, and then take the inverse
Fourier transform.
I've got a kernel K, a 7x7 Matrix
and a Image I, a 512x512 Matrix.
I do not understand how to multiply K by I.
Is the only way to do that by making K as big as I (512x512) ?
Yes, you do need to make K as big as I by padding it with zeros. Also, after padding, but before you take the FFT of the kernel, you need to translate it with wraparound, such that the center of the kernel (the peak of the Gaussian) is at (0,0). Otherwise, your filtered image will be translated. Alternatively, you can translate the resulting filtered image once you are done.
Another point: for small kernels not using the FFT may actually be faster. A 2D Gaussian kernel is separable, meaning that you can separate it into two 1D kernels for x and y. Then instead of a 2D convolution, you can do two 1D convolutions in x and y directions in the spatial domain. For smaller kernels that may end up being faster than doing the convolution in the frequency domain using the FFT.
If you are comfortable with pixel shader and if FFT is not your main goal here, but convolution with gaussian blur kernel IS,- then i can recommend my tutorial on what convolution is
regards.

Resources