I have a even set of points in 2D. I need an algorithm that can make pairs of those points such that total sum of distance between pairs is maximum.
Dynamic Programming, greedy approach won't work, I think.
Can I use Linear Programming or Hungarian algo? or any other?
You certainly can use integer linear programming. Here is an example formulation:
Introduce a binary variable x[ij] for each unordered couple of distrinct points i and j (i.e. such as i<j), where x[ij]=1 iff the points i and j are grouped together.
Compute all the distances d[ij] (for i<j).
The objective is to maximize sum_[i<j] d[ij]*x[ij], subject to the constraints that each point is in exactly one pair, i.e. forall j, sum_[i<j] x[ij] = 1.
Note that this work also for 3d points: you only need the distance between two pairs of points.
Related
I want to do feature scaling datasets by using means and standard deviations, and my code is below; but apparently it is not a univerisal code, since it seems only work with one dataset. Thus I am wondering what is wrong with my code, any help will be appreciated! Thanks!
X is the dataset I am currently using.
mu = mean(X);
sigma = std(X);
m = size(X, 1);
mu_matrix = ones(m, 1) * mu;
sigma_matrix = ones(m, 1) * sigma;
featureNormalize = (X-mu_matrix)/sigma;
Thank you for clarifying what you think the code should be doing in the comments.
My answer will effectively answer why what you think is happening is not what is happening.
First let's talk about the mean and std functions. When their input is a vector (whether this is vertically or horizontally aligned), then this will return a single number which is the mean or standard deviation of that vector respectively, as you might expect.
However, when the input is a matrix, then you need to know what it does differently. Unless you specify the direction (dimension) in which you should be calculating means / std, then it will calculate means along the rows, i.e. returning a single number for each column. Therefore, the end-result of this operation will be a horizontal vector.
Therefore, both mu and sigma will be horizontal vectors in your code.
Now let's move on to the 'matrix multiplication' operator (i.e. *).
When using the matrix multiplication operator, if you multiply a horizontal vector with a vertical vector (i.e. the usual matrix multiplication operation), your output is a single number (i.e. a scalar). However, if you reverse the orientations, as in, you multiply a vertical vector by a horizontal one, you will in fact be calculating a 'Kronecker product' instead. Since the output of the * operation is completely defined by the rows of the first input, and the columns of the second input, whether you're getting a matrix multiplication or a kronecker product is implicit and entirely dependent on the orientation of your inputs.
Therefore, in your case, the line mu_matrix = ones(m, 1) * mu; is not in fact appending a vector of ones, like you say. It is in fact performing the kronecker product between a vertical vector of ones, and the horizontal vector that is your mu, effectively creating an m-by-n matrix with mu repeated vertically for m rows.
Therefore, at the end of this operation, as the variable naming would suggest, mu_matrix is in fact a matrix (same with sigma_matrix), having the same size as X.
Your final step is X- mu_sigma, which gives you at each element, the difference between x and mu at that element. Then you "divide" with the sigma matrix.
Here is why I asked if you were sure you should be using ./ instead of /.
/ is the matrix division operator. With / You are effectively performing matrix multiplication by an inverse matrix, since D / S is mathematically equivalent to D * inv(S). It seems to me you should be using ./ instead, to simply divide each element by the standard deviation of that column (which is why you had to repeat the horizontal vector over m rows in sigma_matrix, so that you could use it for 'elementwise division'), since what you are trying to do is to normalise each row (i.e. observation) of a particular column, by the standard deviation that is specific to that column (i.e. feature).
I am interested in superpixels extracted via energy-driven sampling (SEEDS) which is a method of image segmentation using superpixels. This is also what OpenCV uses to create superpixels. I am having troubles finding documentation behind the SEEDS algorithm. OpenCV gives a very general description which can be found here.
I am looking for a more in depth description on how SEEDS functions (either a general walk through or a mathematical explanation). Any links or thoughts concerning the algorithm would be much appreciated! I can't seem to find any good material. Thanks!
I will first go through some general links and resources and then try to describe the general idea of the algorithm.
SEEDS implementations:
You obviously already saw the documentation here. A usage example for OpenCV's SEEDS implementation can be found here: Itseez/opencv_contrib/modules/ximgproc/samples/seeds.cpp, and allows to adapt the number of superpixels, the number of levels and other parameters live - so after reading up on the idea behind SEEDS you should definitely try the example. The original implementation, as well as a revised implementation (part of my bachelor thesis), can be found on GitHub: davidstutz/superpixels-revisited/lib_seeds and davidstutz/seeds-revised. The implementations should be pretty comparable, though.
Publication and other resources:
The paper was released on arxiv: arxiv.org/abs/1309.3848. A somewhat shorter description (which may be easier to follow) is available on my website: davidstutz.de/efficient-high-quality-superpixels-seeds-revised. The provided algorithm description should be easy to follow and -- in the best case -- allow to implement SEEDS (see the "Algorithm" section of the article). A more precise description can also be found in my bachelor thesis, in particular in section 3.1.
General description:
Note that this description is based on both the above mentioned article and my bachelor thesis. Both offer a mathematically concise description.
Given an image of with width W and height H, SEEDS starts by grouping pixels into blocks of size w x h. These blocks are further arranged into groups of 2 x 2. This schemes is repeated for L levels (this is the number of levels parameter). So at level l, you have blocks of size
w*2^(l - 1) x h*2^(l - 1).
The number of superpixels is determined by the blocks at level L, i.e. letting w_L and h_L denote the width and height of the blocks at level L, the number of superpixels is
S = W/w_L * H/h_L
where we use integer divisions.
The initial superpixel segmentation which is now iteratively refined by exchanging blocks of pixels and individual pixels between neighboring superpixels. To this end, color histograms of the superpixels and all blocks are computed (the histograms are determined by the number of bins parameter in the implementation). This can be done efficiently by seeing that the histogram of a superpixel is just the sum of the histograms of the 2 x 2 blocks it consists of, and the histogram of one of these blocks is the sum of the histograms of the 2 x 2 underlying blocks (and so on). So let h_i be the histogram of a block of pixels belonging to superpixel j, and h_j the histogram of this superpixel. Then, the similarity of the block j to superpixel j is computed by the histogram intersection of h_i and h_j (see one of the above resources for the equation). Similarly, the similarity of a pixel and a superpixel is either the Euclidean distance of the pixel color to the superpixel mean color (this is the better performing option), or the probability of the pixel's color belonging to the superpixel (which is simply the normalized entry of the superpixel's histogram at the pixel's color). With this background, the algorithm can be summarized as follow:
initialize block hierarchy and the initial superpixel segmentation
for l = L - 1 to 1 // go through all levels
// for level l = L these are the initial superpixels
for each block in level l
initialize the color histogram of this block
// as described this is done using the histograms of the level below
// now we start exchanging blocks between superpixels
for l = L - 1 to 1
for each block at level l
if the block lies at the border to a superpixel it does not belong to
compute the histogram intersection with both superpixels
assign the block to the superpixel with the highest intersection
// now we exchange individual pixels between superpixels
for all pixels
if the pixel lies at the border to a superpixel it does not belong to
compute the Euclidean distance of the pixel to both superpixel's mean color
assign the pixel to the closest superpixel
In practice, the block updates and pixel updates are iterated more than ones (which is the number of iterations parameter), and often twice as many iterations per level are done (which is the double step parameter). In the original segmentation, the number of superpixels is computed from w, h, L and the image size. In OpenCV, using the above equations, w and h is computed from the desired number of superpixels and number of levels (which are determined by the corresponding parameters).
One parameter remains unclear: the prior tries to enforce smooth boundaries. In practice this is done by considering the 3 x 3 neighborhood around a pixel which is going to be updated. If most of the pixels in this neighborhood belong to superpixel j, the pixel to be updated is also more likely to belong to superpixel j (and vice versa). OpenCV's implementation as well as my implementation (SEEDS revised), allow to consider larger neighborhoods k x k with k in {0,...,5} in the case of OpenCV.
I understood the overall SVM algorithm consisting of Lagrangian Duality and all, but I am not able to understand why particularly the Lagrangian multiplier is greater than zero for support vectors.
Thank you.
This might be a late answer but I am putting my understanding here for other visitors.
Lagrangian multiplier, usually denoted by α is a vector of the weights of all the training points as support vectors.
Suppose there are m training examples. Then α is a vector of size m. Now focus on any ith element of α: αi. It is clear that αi captures the weight of the ith training example as a support vector. Higher value of αi means that ith training example holds more importance as a support vector; something like if a prediction is to be made, then that ith training example will be more important in deriving the decision.
Now coming to the OP's concern:
I am not able to understand why particularly the Lagrangian multiplier
is greater than zero for support vectors.
It is just a construct. When you say αi=0, it is just that ith training example has zero weight as a support vector. You can instead also say that that ith example is not a support vector.
Side note: One of the KKT's conditions is the complementary slackness: αigi(w)=0 for all i. For a support vector, it must lie on the margin which implies that gi(w)=0. Now αi can or cannot be zero; anyway it is satisfying the complementary slackness condition.
For αi=0, you can choose whether you want to call such points a support vector or not based on the discussion given above. But for a non-support vector, αi must be zero for satisfying the complementary slackness as gi(w) is not zero.
I can't figure this out too...
If we take a simple example, say of 3 data points, 2 of positive class (yi=1): (1,2) (3,1) and one negative (yi=-1): (-1,-1) - and we calculate using Lagrange multipliers, we will get a perfect w (0.25,0.5) and b = -0.25, but one of our alphas was negative (a1 = 6/32, a2 = -1/32, a3 = 5/32).
I have set of about 200 points (x,y) of an image. The 200 data belong to 11 classes (which I think will become the class labels). My problem is how do I represent the x, y values as one data?
My first thought is that I should represent them separately with the labels and then when I get a point to classify, I will classify x and y separately. Something in me tells me that this is incorrect.
Please advice me how to present the x,y value as one data element.
I can't imagine what problem you meet. In kNN algorithm, we can use variables with multiple dimensions, you just need to use list in python standard library or array in Numpy library to organize the data such as : group = numpy.array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
or group = [[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]] to represent (1.0,1.1) (1.0,1.0) (0,0) (0,0.1).
However, I suggest to use numpy, as there're so many functions in it and they are implemented by C language which ensure the efficiency of programs.
If you use numpy, you'd better to do all the operations in matrix way, for example, you can use point=tile([0,0],(3,1))anddistance(group-point)(distance is a function written by me) to calculate the distance without iteration.
The key is not representation but distance calculation instead. The points in your case are essentially one element but with two dimensions (x, y). The kNN algorithm can handle n-dimension case itself: it finds the k-nearest neighbors. So you can use the euclidean distance d((x1, y1), (x2, y2))=((x1-x2)^2 + (y1-y2)^2)^0.5, where (x1, y1) represents the first point to calculate, as the distances of points in your case.
I have some problem understanding euclidean distance. I have two different entities and I want to measure the similarity between these entities.
Lets suppose that entity A has 2 feature vectors and entity B has 1 feature vector only. How am I supposed to calculate the euclidean distance between these two entities in order to know the similarity?
Thanks a lot.
you can calculate the eucledean distance only for vectors of the same dimension. But you could define some default values for the features that are missin in entity 2
L2 is between two feature vectors. These two would be natural ways of doing it:
You could find the minimum L2 distance between all the feature vectors of entity 1 and all the feature vectors of entity 2. If we have 2 vector for entity 1 like A=[1,3,2,1] and B=[3,2,4,1] AND 1 vector for entity 2 like C=[1,2,4,2]. Then dist = min(d([1,3,2,1],[1,2,4,2]),d([3,2,4,1],[1,2,4,2])
You could find the average vectors between all the vectors of entity 1 and the average vector of entity 2. Then compute the L2 distance. If we have 2 vector for entity 1 like A=[1,3,2,1] and B=[3,2,4,1] AND 1 vector for entity 2 like C=[1,2,4,2]. Then dist = d([(1+3)/2,(3+2)/2,(2+4)/2,(1+1)/2],[1,2,4,2])
This is not a bad question at all.
Sometimes mathematicians define the Euclidean distance between two sets (A and B) of elements as the minimum distance between any two pairs of elements from either set.
You can also use the maximum over these two sets. That is called the Hausdorff distance.
Distance between two sets
In other words, you can compute the Euclidean distance between each element of set A to each element of set B and then define the distance, d(A,B), between the two sets as the minimum (or maximum) distance of any of the element pairs that you've computed.
Hausdorff (maximum) distance has some nicer mathematical properties and on the space of non-empty, compact sets (which your element will be since they are discrete) it will be a proper mathematical distance, in that it satisfies:
For all non-empty compact sets A,B,C
d(A,B) >= 0 (with d(A,B) = 0 if and only if A=B)
d(A,B) = d(B,A)
d(A,B) <= d(A,C) + d(C,B)