The algorithm for the K-means++ is:
Take one centroid c(i), chosen uniformly at random from the dataset.
Take a new Centroid c(i), choosing an instance x(i) from the dataset with the probability
D(X(i))^2/Sum(D(X(j))^2) from j=1 to m, where D(X(i)) is the distance between the instance and the closest centroid which is selected.
What is this parameter m used in the summation of the probability?
It might have been helpful to see the original formulation, but the algorithm is quite clear: during the initialization phase, for each point not used as a centroid, calculate the distance between said point and the nearest centroid, that will be the distance D(X[i]), the pick a random point in this set of points with probability weighted with D(X[i])^2
In your formulation it seems you got m points unused.
Related
In this Distill article (https://distill.pub/2017/feature-visualization/) in footnote 8 authors write:
The Fourier transforms decorrelates spatially, but a correlation will still exist
between colors. To address this, we explicitly measure the correlation between colors
in the training set and use a Cholesky decomposition to decorrelate them.
I have trouble understanding how to do that. I understand that for an arbitrary image I can calculate a correlation matrix by interpreting the image's shape as [channels, width*height] instead of [channels, height, width]. But how to take the whole dataset into account? It can be averaged over, but that doesn't have anything to do with Cholesky decomposition.
Inspecting the code confuses me even more (https://github.com/tensorflow/lucid/blob/master/lucid/optvis/param/color.py#L24). There's no code for calculating correlations, but there's a hard-coded version of the matrix (and the decorrelation happens by matrix multiplication with this matrix). The matrix is named color_correlation_svd_sqrt, which has svd inside of it, and SVD wasn't mentioned anywhere else. Also the matrix there is non-triangular, which means that it hasn't come from the Cholesky decomposition.
Clarifications on any points I've mentioned would be greatly appreciated.
I figured out the answer to your question here: How to calculate the 3x3 covariance matrix for RGB values across an image dataset?
In short, you calculate the RGB covariance matrix for the image dataset and then do the following calculations
U,S,V = torch.svd(dataset_rgb_cov_matrix)
epsilon = 1e-10
svd_sqrt = U # torch.diag(torch.sqrt(S + epsilon))
I have hypothesis function h(x) = theta0 + theta1*x.
How can I select theta0 and theta1 value for the linear regression model?
The question is unclear whether you would like to do this by hand (with the underlying math), use a program like Excel, or solve in a language like MATLAB or Python.
To start, here is a website offering a summary of the math involved for a univariate calculation: http://www.statisticshowto.com/probability-and-statistics/regression-analysis/find-a-linear-regression-equation/
Here, there is some discussion of the matrix formulation of the multivariate problem (I know you asked for univariate but some people find the matrix formulation helps them conceptualize the problem): https://onlinecourses.science.psu.edu/stat501/node/382
We should start with a bit of an intuition, based on the level of the question. The goal of a linear regression is to find a set of variables, in your case thetas, that minimize the distance between the line formed and the data points observed (often, the square of this distance). You have two "free" variables in the equation you defined. First, theta0: this is the intercept. The intercept is the value of the response variable (h(x)) when the input variable (x) is 0. This visually is the point where the line will cross the y axis. The second variable you have defined is the slope (theta1), this variable expresses how much the response variable changes when the input changes. If theta1 = 0, h(x) does not change when x changes. If theta1 = 1, h(x) increases and decreases at the same rate as x. If theta1 = -1, h(x) responds in the opposite direction: if x increases, h(x) decreases by the same amount; if x decreases, h(x) increases by the quantity.
For more information, Mathworks provides a fairly comprehensive explanation: https://www.mathworks.com/help/symbolic/mupad_ug/univariate-linear-regression.html
So after getting a handle on what we are doing conceptually, lets take a stab at the math. We'll need to calculate the standard deviation of our two variables, x and h(x). WTo calculate the standard deviation, we will calculate the mean of each variable (sum up all the x's and then divide by the number of x's, do the same for h(x)). The standard deviation captures how much a variable differs from its mean. For each x, subtract the mean of x. Sum these differences up and then divide by the number of x's minus 1. Finally, take the square root. This is your standard deviation.
Using this, we can normalize both variables. For x, subtract the mean of x and divide by the standard deviation of x. Do this for h(x) as well. You will now have two lists of normalized numbers.
For each normalized number, multiply the value by its pair (the first normalized x value with its h(x) pair, for all values). Add these products together and divide by N. This gives you the correlation. To get the least squares estimate of theta1, calculate this correlation value times the standard deviation of h(x) divided by the standard deviation of x.
Given all this information, calculating the intercept (theta0) is easy, all we'll have to do is take the mean of h(x) and subtract the product (multiply!) of our calculated theta1 and the average of x.
Phew! All taken care of! We have our least squares solution for those two variables. Let me know if you have any questions! One last excellent resource: https://people.duke.edu/~rnau/mathreg.htm
If you are asking about the hypothesis function in linear regression, then those theta values are selected by an algorithm called gradient descent. This helps in finding the theta values to minimize the cost function.
I'm new in Computer Vision, but I'm want to discover this domain.
Now I learn how to detect spatial-temporal interest points. To this, I've read this article of Ivan Laptev.
So, I stuck on transformation image from R2(plane) to R1(vector). (in this article paragraph 2.1 in the start):
In the spatial domain, we can model an image f(sp):R^2->R its linear scale-space representation (Witkin, 1983; Koenderink and van Doorn, 1992;
Lindeberg, 1994; Florack, 1997) 2
I don't understand, how we get 1(image from R^2, R)
Can somebody give good article about this, or explain by himself?
As I understand, we use convolution with Gaussian kernel to this. But, after convolution we get also image R^2.
If you model your image as a function f(x,y) you pass values in R^2 (one dimension for each, x and one for y). And you get a one dimensional output (scalar) for each pair of x and y, right? Just stupid math:-)
The paragraph just state that the function operates on a neighborhood in R^2 and returns a scalar. This is true for a Gaussian it takes a neighborhood around a point and returns a scalar which is a weighted sum of the pixels in the neighborhood as a function of there location in relation to the center of the neighborhood.
I have a Kalman filter tracking a point, with a state vector (x, y, dx/dt, dy/dt).
At a given update, I have a set of candidate points which may correspond to the tracked points. I would like to iterate through these candidates and choose the one most likely to correspond to the tracked point, but only if the probability of that point corresponding to the tracked point is greater than a threshold (e.g. p > 0.5).
Therefore I need to use the covariance and state matrices of the filter to estimate this probability. How can I do this?
Additionally, note that my state vector is four dimensions, but the measurements are in two dimensions (x, y).
When you predict the measurements with y = Hx you also compute the covariance of y as H*P*H.T. This property is why we use variance in the Kalman Filter.
The geometrical way to understand how far a given point is from your predicted point is a error ellipse or confidence region. A 95% confidence region is the ellipse scaled to 2*sigma (if that isn't intuitive, you should go read about normal distributions, because that is what the KF thinks it is working on). If the covariance is diagonal, the error ellipse will be axis aligned. If there are co-varying terms (which there may not be if you have not introduced them anywhere via Q or R) then the ellipse will be tilted.
The mathematical way is with the Mahalanobis distance, which just directly formulates the geometrical representation above as a distance. The distance scale is standard deviations, so your P=0.5 corresponds to a distance of 0.67 (again, see normal distributions if this is surprising).
The most probable point (I suppose from detections) will be the nearest point to filter prediction.
I am currently learning spectral clustering.
We decomposite the Laplacian Matrix which calculated by L = D - W.
W is the adjacent matrix.
However, I have found a lot codes online like
spectral clustering
they directly calculate D by diag(sum(W)).
I know that D should be degree matrix which means each value on the diagonal are the degree for each point.
But if W is a weighted graph , diag(sum(W)) is not equal to the actual "Degree matrix"...
Why they still do this.
When you work with weighted graphs you can compute the degree matrix from the weighted adjacency matrix, some time it is good to have weights because they hide geometric information. Moreover, if you have the weighted adj matrix computing degree matrix using the binary form of your weighted adj matrix is easy. In addition, I think your question has more theoretical (e.g. Mathoverflow) than programming foundation (e.g stackoverflow) ;). In any case, you should consult this link for more intuitive explanation of L and its geometric relation.
Good luck :)