How to calculate correlation of colours in a dataset? - machine-learning

In this Distill article (https://distill.pub/2017/feature-visualization/) in footnote 8 authors write:
The Fourier transforms decorrelates spatially, but a correlation will still exist
between colors. To address this, we explicitly measure the correlation between colors
in the training set and use a Cholesky decomposition to decorrelate them.
I have trouble understanding how to do that. I understand that for an arbitrary image I can calculate a correlation matrix by interpreting the image's shape as [channels, width*height] instead of [channels, height, width]. But how to take the whole dataset into account? It can be averaged over, but that doesn't have anything to do with Cholesky decomposition.
Inspecting the code confuses me even more (https://github.com/tensorflow/lucid/blob/master/lucid/optvis/param/color.py#L24). There's no code for calculating correlations, but there's a hard-coded version of the matrix (and the decorrelation happens by matrix multiplication with this matrix). The matrix is named color_correlation_svd_sqrt, which has svd inside of it, and SVD wasn't mentioned anywhere else. Also the matrix there is non-triangular, which means that it hasn't come from the Cholesky decomposition.
Clarifications on any points I've mentioned would be greatly appreciated.

I figured out the answer to your question here: How to calculate the 3x3 covariance matrix for RGB values across an image dataset?
In short, you calculate the RGB covariance matrix for the image dataset and then do the following calculations
U,S,V = torch.svd(dataset_rgb_cov_matrix)
epsilon = 1e-10
svd_sqrt = U # torch.diag(torch.sqrt(S + epsilon))

Related

Why do we normalize homography or fundamental matrix?

I want to know about why do we normalize the homography or fundamental matrix? Here is the code in particular.
H = H * (1.0 / H[2, 2]) # Normalization step. H is [3, 3] matrix.
I can understand that we have to normalize the data before computing SVD because of instability caused by linear least squares but why do we normalize it in end?
A homography in 3D space has 8 degrees of freedom by definition, mapping from one plane to another using perspective. Such a homography can be defined by giving four points, which makes eight coordinates (scalars).
A 3x3 matrix has 9 elements, so it has 9 degrees of freedom. That is one degree more than needed for a homography.
The homography doesn't change when the matrix is scaled (multiplied by a scalar). All the math works the same. You don't need to normalize your homography matrix.
It is a good idea to normalize.
For one, it makes the arithmetic somewhat tamer. Have some wikipedia links to fields of study because weaving all these into a coherent sentence... doesn't add anything:
Numerical analysis, Condition number, Floating-point arithmetic, Numerical error, Numerical stability, ...
Also, normalization makes the matrix easier for humans to interpret. The most common normalization is to scale the matrix such that the last element becomes 1. That is convenient because this whole math happens in a projective space, where the projection causes points to be mapped to the w=1 plane, making vectors have a 1 for the last element.
How is the homography matrix provided to you?
For example, in the scene that some library function calculates and provides the homography matrix to you,
if the function specification doesn't mention about the scale...
In an extreme case, the function can be implemented as:
Matrix3x3 CalculateHomographyMatrix( some arguments )
{
Matrix3x3 H = ...; //Homogoraphy Calculation
return Non_Zero_Random_Value * H; //Wow!
}
Element values may become very large or very small and using such values to your process may cause problems (floating point computation errors).

How do I quantify the similarity between spatial patterns

Problem Formulation
Suppose I have several 10000*10000 grids (can be transformed to 10000*10000 grayscale images. I would regard image and grid as the same below), and at each grid-point, there is some value (in my case it's the number of copies of a specific gene expressed at that pixel location, note that the locations are same for every grid). What I want is to quantify the similarity between two 2D spatial point-patterns of this kind (i.e., the spatial expression patterns of two distinct genes), and rank all pairs of genes in a "most similar" to "most dissimilar" manner. Note that it is not the spatial pattern in terms of the absolute value of expression level that I care about, rather, it's the relative pattern that I care about. As a result, I might need to utilize some correlation instead of distance metrics when comparing corresponding pixels.
The easiest method might be directly viewing all pixels together as a vector and calculate some correlation metric between the two vectors. However, this does not take the spatial information into account. Those genes that I am most interested in have spatial patterns, i.e., clustering and autocorrelation effects their expression pattern (though their "cluster" might take a very thin shape rather than sticking together, e.g., genes specific to the skin cells), which means usually the image would have several peak local regions, while expression levels at other pixels would be extremely low (near 0).
Possible Directions
I am not exactly sure if I should (1) consider applying image similarity comparison algorithms from image processing that take local structure similarity into account (e.g., SSIM, SIFT, as outlined in Simple and fast method to compare images for similarity), or (2) consider applying spatial similarity comparison algorithms from spatial statistics in GIS (there are some papers about this, but I am not sure if there are some algorithms dealing with simple point data rather than the normal region data with shape (in a more GIS-sense way, I need to find an algorithm dealing with raster data rather than polygon data)), or (3) consider directly applying statistical methods that deal with discrete 2D distributions, which I think might be a bit crude (seems to disregard the regional clustering/autocorrelation effects, ~ Tobler's First Law of Geography).
For direction (1), I am thinking about a simple method, that is, first find some "peak" regions in the two images respectively and regard their union as ROIs, and then compare those ROIs in the two images specifically in a simple pixel-by-pixel way (regard them together as a vector), but I am not sure if I can replace the distance metrics with correlation metrics, and am a bit worried that many methods of similarity comparison in image processing might not work well when the two images are dissimilar. For direction (2), I think this direction might be more appropriate because this problem is indeed related to spatial statistics, but I do not yet know where to start in GIS. I guess direction (3) is somewhat masked by (2), so I might not consider it here.
Sample
Sample image: (There are some issues w/ my own data, so here I borrowed an image from SpatialLIBD http://research.libd.org/spatialLIBD/reference/sce_image_grid_gene.html)
Let's say the value at each pixel is discretely valued between 0 and 10 (could be scaled to [0,1] if needed). The shapes of tissues in the right and left subfigure are a bit different, but in my case they are exactly the same.
PS: There is one might-be-serious problem regarding spatial statistics though. The expression of certain marker genes of a specific cell type might not be clustered in a bulk, but in the shape of a thin layer or irregularly. For example, if the grid is a section of the brain, then the high-expression peak region for cortex layer-specific genes (e.g., Ctip2 for layer V) might form a thin arc curved layer in the 10000*10000 grid.
UPDATE: I found a method belonging to the (3) direction called "optimal transport" problem that might be useful. Looks like it integrates locality information into the comparison of distribution. Would try to test this way (seems to be the easiest to code among all three directions?) tomorrow.
Any thoughts would be greatly appreciated!
In the absence of any sample image, I am assuming that your problem is similar to texture-pattern recognition.
We can start with Local Binary Patterns (2002), or LBPs for short. Unlike previous (1973) texture features that compute a global representation of texture based on the Gray Level Co-occurrence Matrix, LBPs instead compute a local representation of texture by comparing each pixel with its surrounding neighborhood of pixels. For each pixel in the image, we select a neighborhood of size r (to handle variable neighborhood sizes) surrounding the center pixel. A LBP value is then calculated for this center pixel and stored in the output 2D array with the same width and height as the input image. Then you can calculate a histogram of LBP codes (as final feature vector) and apply machine learning for classifications.
LBP implementations can be found in both the scikit-image and OpenCV but latter's implementation is strictly in the context of face recognition — the underlying LBP extractor is not exposed for raw LBP histogram computation. The scikit-image implementation of LBPs offer more control of the types of LBP histograms you want to generate. Furthermore, the scikit-image implementation also includes variants of LBPs that improve rotation and grayscale invariance.
Some starter code:
from skimage import feature
import numpy as np
from sklearn.svm import LinearSVC
from imutils import paths
import cv2
import os
class LocalBinaryPatterns:
def __init__(self, numPoints, radius):
# store the number of points and radius
self.numPoints = numPoints
self.radius = radius
def describe(self, image, eps=1e-7):
# compute the Local Binary Pattern representation
# of the image, and then use the LBP representation
# to build the histogram of patterns
lbp = feature.local_binary_pattern(image, self.numPoints,
self.radius, method="uniform")
(hist, _) = np.histogram(lbp.ravel(),
bins=np.arange(0, self.numPoints + 3),
range=(0, self.numPoints + 2))
# normalize the histogram
hist = hist.astype("float")
hist /= (hist.sum() + eps)
# return the histogram of Local Binary Patterns
return hist
# initialize the local binary patterns descriptor along with
# the data and label lists
desc = LocalBinaryPatterns(24, 8)
data = []
labels = []
# loop over the training images
for imagePath in paths.list_images(args["training"]):
# load the image, convert it to grayscale, and describe it
image = cv2.imread(imagePath)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
hist = desc.describe(gray)
# extract the label from the image path, then update the
# label and data lists
labels.append(imagePath.split(os.path.sep)[-2])
data.append(hist)
# train a Linear SVM on the data
model = LinearSVC(C=100.0, random_state=42)
model.fit(data, labels)
Once our Linear SVM is trained, we can use it to classify subsequent texture images:
# loop over the testing images
for imagePath in paths.list_images(args["testing"]):
# load the image, convert it to grayscale, describe it,
# and classify it
image = cv2.imread(imagePath)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
hist = desc.describe(gray)
prediction = model.predict(hist.reshape(1, -1))
# display the image and the prediction
cv2.putText(image, prediction[0], (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
1.0, (0, 0, 255), 3)
cv2.imshow("Image", image)
cv2.waitKey(0)
Have a look at this excellent tutorial for more details.
Ravi Kumar (2016) was able to extract more finely textured images by combining LBP with Gabor filters to filter the coefficients of LBP pattern

How we can transform image from plane to vector?

I'm new in Computer Vision, but I'm want to discover this domain.
Now I learn how to detect spatial-temporal interest points. To this, I've read this article of Ivan Laptev.
So, I stuck on transformation image from R2(plane) to R1(vector). (in this article paragraph 2.1 in the start):
In the spatial domain, we can model an image f(sp):R^2->R its linear scale-space representation (Witkin, 1983; Koenderink and van Doorn, 1992;
Lindeberg, 1994; Florack, 1997) 2
I don't understand, how we get 1(image from R^2, R)
Can somebody give good article about this, or explain by himself?
As I understand, we use convolution with Gaussian kernel to this. But, after convolution we get also image R^2.
If you model your image as a function f(x,y) you pass values in R^2 (one dimension for each, x and one for y). And you get a one dimensional output (scalar) for each pair of x and y, right? Just stupid math:-)
The paragraph just state that the function operates on a neighborhood in R^2 and returns a scalar. This is true for a Gaussian it takes a neighborhood around a point and returns a scalar which is a weighted sum of the pixels in the neighborhood as a function of there location in relation to the center of the neighborhood.

Are Wavelet Coefficients simply the pixel values of decomposed image in 2D Discrete Wavelet Transform

I've been working with Discrete Wavelet Transform, I'm new to this theory. I want to access and modify the wavelet coefficients of the decomposed image, Are those wavelet coefficients simply the pixel values of the decomposed image in 2D DWT?
This is for example the result of DWT Decomposition:
So, when I want to access and modify the Wavelet Coefficients, can I just iterate through the pixel values of above image? Thank you for your help.
No. The image is merely illustrative.
The image you are looking on does not exactly correspond to original coefficients. The original wavelet coefficients are real numbers. Unlike them, you are looking on their absolute values quantized into a range from 0 to 255.
It is not true that the coefficients were calculated as pairwise sums and differences of the input samples. The coefficients were calculated using two complementary filters. See the description here. However, it is essential that these coefficients were adjusted and it is no longer possible to synthesize the original image. If you need to synthesize the image, you cannot access the pixels of the referenced image.

Cannot comprehend output of sklearn.decomposition.PCA

I am a little confused about PCA algorithm especially the one implemented in sklearn.
when I use pca in sklearn decomposition with a 4000X784 matrix
X.shape = (4000,784)
pca = PCA()
pca.fit(X)
pca.explained_variance_.shape
I get
(784,)
On the other hand when I use another dataset with shape (50,784)
(50,)
Am I doing something wrong?
Let's see:
explained_variance_ratio_ array, [n_components] Percentage of variance explained by each of the selected components. k is not set then all components are stored and the sum of explained variances is equal to 1.0
In the first case, your data has 4000 elements with 748 components, so the attribute gives you an array of 748 values. If this is correct, then you need to transpose the second dataset.
The maximal number of components you get with PCA is equal to the minimum dimension of your X matrix.
The explained_variance_ method shows you how much of the variance of the data is explained by each PCA component.
These array shapes are normal because you get 768 components when you have more data than features, but only 50 when you have 50 lines of data.

Resources