Degree Matrix in Spectral Clustering - machine-learning

I am currently learning spectral clustering.
We decomposite the Laplacian Matrix which calculated by L = D - W.
W is the adjacent matrix.
However, I have found a lot codes online like
spectral clustering
they directly calculate D by diag(sum(W)).
I know that D should be degree matrix which means each value on the diagonal are the degree for each point.
But if W is a weighted graph , diag(sum(W)) is not equal to the actual "Degree matrix"...
Why they still do this.

When you work with weighted graphs you can compute the degree matrix from the weighted adjacency matrix, some time it is good to have weights because they hide geometric information. Moreover, if you have the weighted adj matrix computing degree matrix using the binary form of your weighted adj matrix is easy. In addition, I think your question has more theoretical (e.g. Mathoverflow) than programming foundation (e.g stackoverflow) ;). In any case, you should consult this link for more intuitive explanation of L and its geometric relation.
Good luck :)

Related

K-means++ clustering Algorithm

The algorithm for the K-means++ is:
Take one centroid c(i), chosen uniformly at random from the dataset.
Take a new Centroid c(i), choosing an instance x(i) from the dataset with the probability
D(X(i))^2/Sum(D(X(j))^2) from j=1 to m, where D(X(i)) is the distance between the instance and the closest centroid which is selected.
What is this parameter m used in the summation of the probability?
It might have been helpful to see the original formulation, but the algorithm is quite clear: during the initialization phase, for each point not used as a centroid, calculate the distance between said point and the nearest centroid, that will be the distance D(X[i]), the pick a random point in this set of points with probability weighted with D(X[i])^2
In your formulation it seems you got m points unused.

How to calculate correlation of colours in a dataset?

In this Distill article (https://distill.pub/2017/feature-visualization/) in footnote 8 authors write:
The Fourier transforms decorrelates spatially, but a correlation will still exist
between colors. To address this, we explicitly measure the correlation between colors
in the training set and use a Cholesky decomposition to decorrelate them.
I have trouble understanding how to do that. I understand that for an arbitrary image I can calculate a correlation matrix by interpreting the image's shape as [channels, width*height] instead of [channels, height, width]. But how to take the whole dataset into account? It can be averaged over, but that doesn't have anything to do with Cholesky decomposition.
Inspecting the code confuses me even more (https://github.com/tensorflow/lucid/blob/master/lucid/optvis/param/color.py#L24). There's no code for calculating correlations, but there's a hard-coded version of the matrix (and the decorrelation happens by matrix multiplication with this matrix). The matrix is named color_correlation_svd_sqrt, which has svd inside of it, and SVD wasn't mentioned anywhere else. Also the matrix there is non-triangular, which means that it hasn't come from the Cholesky decomposition.
Clarifications on any points I've mentioned would be greatly appreciated.
I figured out the answer to your question here: How to calculate the 3x3 covariance matrix for RGB values across an image dataset?
In short, you calculate the RGB covariance matrix for the image dataset and then do the following calculations
U,S,V = torch.svd(dataset_rgb_cov_matrix)
epsilon = 1e-10
svd_sqrt = U # torch.diag(torch.sqrt(S + epsilon))

PCA:how eigenvectors and eigen values work

Scene: currently I am working on image classification project . I have used HOG as image feature. dimension of hog feature vector was too large to feed neural net.so I decided to preprocess it with PCA to reduce its dimension.
what i got: I didn't knew about PCA.. then I searched for tutorial regarding that .While learning about PCA I faced Eigen values and Eigen vectors..what I understood was
let x->n-dimensional random vector(feature vector fed for PCA) A->nxn matrix and l=(Scalar)scale factor.. then if transformation matrix A is chosen so that it satisfy following equation
Ax=lx
then x is called eigenvector and l is called Eigen values corresponding to A.
now with characteristic polynomial equation we estimate different values of l then for each l we estimate Eigen vector
question:
1.i am confusing with random vector as feature vector which we need to reduce and matrix A containing coefficients of linear combination formed...I am I right or not.I mean x is transformed to lx then transformation matrix A should be estimated for given l but NO! A is given already.
2.If vector x is feature vector which we already have.then why we say that for each l(Eigen value) we calculate x(Eigen vector)
3.After looking at above facts It was confusing that if A is given and x is unknown then are we considering the feature vector actually.if yes is it A or x where it i.we are estimating Eigen values and vectors but wherer are we
considering the feature vector we want to process
I am very confused of what is given and what we are calculating please help!

Handling zero rows/columns in covariance matrix during em-algorithm

I tried to implement GMMs but I have a few problems during the em-algorithm.
Let's say I've got 3D Samples (stat1, stat2, stat3) which I use to train the GMMs.
One of my training sets for one of the GMMs has in nearly every sample a "0" for stat1. During training I get really small Numbers (like "1.4456539880060609E-124") in the first row and column of the covariance matrix which leads in the next iteration of the EM-Algorithm to 0.0 in the first row and column.
I get something like this:
0.0 0.0 0.0
0.0 5.0 6.0
0.0 2.0 1.0
I need the inverse covariance matrix to calculate the density but since one column is zero I can't do this.
I thought about falling back to the old covariance matrix (and mean) or to replace every 0 with a really small number.
Or is there a another simple solution to this problem?
Simply your data lies in degenerated subspace of your actual input space, and GMM is not well suited in most generic form for such setting. THe problem is that empirical covariance estimator that you use simply fail for such data (as you said - you cannot inverse it). What you usually do? You chenge covariance estimator to the constrained/regularized ones, which contain:
Constant-based shrinking, thus instead of using Sigma = Cov(X) you do Sigma = Cov(X) + eps * I, where eps is prefedefined small constant, and I is identity matrix. Consequently you never have a zero values on the diagonal, and it is easy to prove that for reasonable epsilon, this will be inversible
Nicely fitted shrinking, like Oracle Covariance Estimator or Ledoit-Wolf Covariance Estimator which find best epsilon based on the data itself.
Constrain your gaussians to for example spherical family, thus N(m, sigma I), where sigma = avg_i( cov( X[:, i] ) is the mean covariance per dimension. This limits you to spherical gaussians, and also solves the above issue
There are many more solutions possible, but all based on the same thing - chenge covariance estimator in such a way, that you have a guarantee of invertability.

Margin for optimal decision plane

For a given dataset of 2-D input data, we apply the SVM learning
algorithm and achieve an optimal decision plane:
H(x) = x^1 + 2x^2 + 3
What is the margin of this SVM?
I've been looking at this for hours trying to work out how to answer this. I think it's meant to be relatively simple but I've been searching through my learning material and cannot find how I'm meant to answer this.
I'd appreciate some help on the steps I should use to solve this.
Thanks.
It is imposible to calculate the margin wit only given optimal decision plane. You should give the support vectors or at least samples of classes.
Anyway, you can follow this steps:
1- Calculate Lagrange Multipliers (alphas) I don' t know which environment you work on but you can use Quadratic Programming Solver of MATLAB: quadprog(), it is not hard to use.
2- Find support vectors. Remember, only alphas of support vectors don' t equal to zero (but other alphas of samples equal to zero) so you can find support vectors of classes.
3- Calculate w vector which is a vector orthogonal to optimal hyperplane. You know, can use the summation below to calculate this vector:
where,
alpha(i): alphas (lagrange multipliers) of support vector;
y(i) : labels of samples (say -1 or +1);
phi() : kernel function;
x(i) : support vectors.
4- Take one support vector from each class lets say one is SV1 from class 1 and other SV2 from class 2. Now you can calculate the margin using vector projection and dot product:
margin = < (SV1 - SV2), w > / norm(w)
where,
<(SV1 - SV2), w> : dot product of vector (SV1 - SV2) and vector w
norm(w) : norm of vector w

Resources