Analysis of Prim's Algorithm - graph-algorithm

Can anyone explain why we use or what's the importance of using the key array(i.e key[])
in PRIM'S ALGORITHM which deals with the minimum spanning tree problem.
PRIM_MST(G,W,R)//G->graph,W->weighted matrix,R->root vertex
for v<-v[G]
pred[v]<-NIL //pred[]-->predecessor array
Q<-v[G] //Q-->priority queue
while Q!=NULL
for v<-adj[u] //adj[]--> adjacency list matrix
if v belongs to Q && w(Q,v)<key[v]

Key is basically the value on the edge that led to the particular vertex in the graph during the construction of MST
on arriving on a vertex during the algorithm, it checks for the minimum weighted edge connecting the set A(the set of vertices already traversed) and set B(the set of edges not yet traversed). It follows this minimum edge and puts the key of the newly arrived vertex(the one reached after following this min edge) as the weight of this minimum edge


In the algorithm LambdaRank (in Learning to Rank) what does |∆ NDCG| means?

This Article describes the LambdaRank algorithm for information retrieval. In formula 8 page 6, the authors propose to multiply the gradient (lambda) by a term called |∆NDCG|.
I do understand that this term is the difference of two NDCGs when swapping two elements in the list:
the size of the change in NDCG (|∆NDCG|) given by swapping the rank positions of U1 and U2
(while leaving the rank positions of all other urls unchanged)
However, I do not understand which ordered list is considered when swapping U1 and U2. Is it the list ordered by the predictions from the model at the current iteration ? Or is it the list ordered by the ground-truth labels of the documents ? Or maybe, the list of the predictions from the model at the previous iteration as suggested by Tie-Yan Liu in his book Learning to Rank for Information Retrieval ?
Short answer: It's the list ordered by the predictions from the model at the current iteration.
Let's see why it makes sense.
At each training iteration, we perform the following steps (these steps are standard for all Machine Learning algorithms, whether it's classification or regression or ranking tasks):
Calculate scores s[i] = f(x[i]) returned by our model for each document i.
Calculate the gradients of model's weights ∂C/∂w, back-propagated from RankNet's cost C. This gradient is the sum of all pairwise gradients ∂C[i, j]/∂w, calculated for each document's pair (i, j).
Perform a gradient ascent step (i.e. w := w + u * ∂C/∂w where u is step size).
In "Speeding up RankNet" paragraph, the notion λ[i] was introduced as contributions of each document's computed scores (using the model weights at current iteration) to the overall gradient ∂C/∂w (at current iteration). If we order our list of documents by the scores from the model at current iteration, each λ[i] can be thought of as "arrows" attached to each document, the sign of which tells us to which direction, up or down, that document should be moved to increase NDCG. Again, NCDG is computed from the order, predicted by our model.
Now, the problem is that the lambdas λ[i, j] for the pair (i, j) contributes equally to the overall gradient. That means the rankings of documents below, let’s say, 100th position is given equal improtance to the rankings of the top documents. This is not what we want: we should prioritize having relevant documents at the very top much more than having correct ranking below 100th position.
That's why we multiply each of those "arrows" by |∆NDCG| to emphasise on top ranking more than the ranking at the bottom of our list.

Time series distance metric

In order to clusterize a set of time series I'm looking for a smart distance metric.
I've tried some well known metric but no one fits to my case.
ex: Let's assume that my cluster algorithm extracts this three centroids [s1, s2, s3]:
I want to put this new example [sx] in the most similar cluster:
The most similar centroids is the second one, so I need to find a distance function d that gives me d(sx, s2) < d(sx, s1) and d(sx, s2) < d(sx, s3)
Here the results with metrics [cosine, euclidean, minkowski, dynamic type warping]
edit 2
User Pietro P suggested to apply the distances on the cumulated version of the time series
The solution works, here the plots and the metrics:
nice question! using any standard distance of R^n (euclidean, manhattan or generically minkowski) over those time series cannot achieve the result you want, since those metrics are independent of the permutations of the coordinate of R^n (while time is strictly ordered and it is the phenomenon you want to capture).
A simple trick, that can do what you ask is using the cumulated version of the time series (sum values over time as time increases) and then apply a standard metric. Using the Manhattan metric, you would get as a distance between two time series the area between their cumulated versions.
Another approach would be by utilizing DTW which is an algorithm to compute the similarity between two temporal sequences. Full disclosure; I coded a Python package for this purpose called trendypy, you can download via pip (pip install trendypy). Here is a demo on how to utilize the package. You're just just basically computing the total min distance for different combinations to set the cluster centers.
what about using standard Pearson correlation coefficient? then you can assign the new point to the cluster with the highest coefficient.
correlation = scipy.stats.pearsonr(<new time series>, <centroid>)
Pietro P's answer is just a special case of applying a convolution to your time series.
If I gave the kernel:
I would get a cumulative series .
Adding a convolution works because you're giving each data point information about it's neighbours - it's now order dependent.
It might be interesting to try with a guassian convolution or other kernels.

How to combine various distance functions into one given the following dataset?

I have a few distance functions which return distance between two images , I want to combine these distance into a single distance, using weighted scoring e.g. ax1+bx2+cx3+dx4 etc i want to learn these weights automatically such that my test error is minimised.
For this purpose i have a labeled dataset which has various triplets of images such that (a,b,c) , a has less distance to b than it has to c.
i.e. d(a,b)<d(a,c)
I want to learn such weights so that this ordering of triplets can be as accurate as possible.(i.e. the weighted linear score given is less for a&b and more for a&c).
What sort of machine learning algorithm can be used for the task,and how the desired task can be achieved?
Hopefully I understand your question correctly, but it seems that this could be solved more easily with constrained optimization directly, rather than classical machine learning (the algorithms of which are often implemented via constrained optimization, see e.g. SVMs).
As an example, a possible objective function could be:
argmin_{w} || e ||_2 + lambda || w ||_2
where w is your weight vector (Oh god why is there no latex here), e is the vector of errors (one component per training triplet), lambda is some tunable regularizer constant (could be zero), and your constraints could be:
max{d(I_p,I_r)-d(I_p,I_q),0} <= e_j for jth (p,q,r) in T s.t. d(I_p,I_r) <= d(I_p,I_q)
for the jth constraint, where I_i is image i, T is the training set, and
d(u,v) = sum_{w_i in w} w_i * d_i(u,v)
with d_i being your ith distance function.
Notice that e is measuring how far your chosen weights are from satisfying all the chosen triplets in the training set. If the weights preserve ordering of label j, then d(I_p,I_r)-d(I_p,I_q) < 0 and so e_j = 0. If they don't, then e_j will measure the amount of violation of training label j. Solving the optimization problem would give the best w; i.e. the one with the lowest error.
If you're not familiar with linear/quadratic programming, convex optimization, etc... then start googling :) Many libraries exist for this type of thing.
On the other hand, if you would prefer a machine learning approach, you may be able to adapt some metric learning approaches to your problem.

In DBSCAN, how to determine border points?

In DBSCAN, the core points is defined as having more than MinPts within Eps.
So if MinPts = 4, a points with total 5 points in Eps is definitely a core point.
How about a point with 4 points (including itself) in Eps? Is it a core point, or a border point?
Border points are points that are (in DBSCAN) part of a cluster, but not dense themselves (i.e. every cluster member that is not a core point).
In the followup algorithm HDBSCAN, the concept of border points was discarded.
Campello, R. J. G. B.; Moulavi, D.; Sander, J. (2013).
Density-Based Clustering Based on Hierarchical Density Estimates.
Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery in Databases, PAKDD 2013. Lecture Notes in Computer Science 7819. p. 160.
which states:
Our new definitions are more consistent with a statistical interpretation of clusters as connected components of a level set of a density [...] border objects do not technically belong to the level set (their estimated density is below the threshold).
Actually, I just re-read the original paper and Definition 1 makes it look like the core point belongs to its own eps neighborhood. So if minPts is 4, then a point needs at least 3 others in its eps neighborhood.
Notice in Definition 1 that they say NEps(p) = {q ∈D | dist(p,q) ≤ Eps}. If the point were excluded from its eps neighborhood, then it would have said NEps(p) = {q ∈D | dist(p,q) ≤ Eps and p != q}. Where != is "not equal to".
This is also reinforced by the authors of DBSCAN in their OPTICS paper in Figure 4.
So I think the SciKit interpretation is correct and the Wikipedia illustration is misleading in
This largely depends on the implementation. The best way is to just play with the implementation yourself.
In the original DBSCAN1 paper, core point condition is given as N_Eps>=MinPts, where N_Eps is the Epsilon neighborhood of a certain data point, which is excluded from its own N_Eps.
Following your example, if MinPts = 4 and N_Eps = 3 (or 4 including itself as you say), then they don't form a cluster according to the original paper. On the other hand, the scikit-learn2 implementation of DBSCAN works otherwise, meaning it counts the point itself for forming a group. So for MinPts=4, four points are needed in total to form a cluster.
[1] Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). "A density-based algorithm for discovering clusters in large spatial databases with noise."

Feedback on algorithm for Steiner Tree with restrictions

For an assignment, I have to create a Steiner Tree. However, this is not a typical Steiner Tree, as the graph structure we're required to use does not allow insertion of new vertices. Rather, the test cases define a graph structure of N vertices and M edges while specifically marking X vertices as target nodes. These are the nodes we have to span while using some, none or all of the unmarked vertices in the graph.
My solution to this problem is
Implement Dijkstra's Algorithm to find the shortest path between all the target vertices
For each of the shortest paths 1:n
Extract all current selected path vertices into a set
Extract all remaining vertices into a set
For all vertices of the current selected path 1:m
Execute Dijkstra to find shortest path between current vertex and other path's vertices
If this creates a spanning tree, save path and length in priority queue sorted by length value
Pop top of priority queue and return path
My issue is that this is an exhaustive search that uses the initial application of Dijkstra to create a reduced set of possible start-end vertices for a shorter path than a minimum spanning tree.
Is there a heuristic or other algorithm that may solve this problem?
With some help, I worked out this answer for a similar problem that I had. Rather than adding new vertices as in a spacial steiner tree problem, the new steiner points in this graph are the vertices that lie along the path between the marked nodes. For a graph with N vertices, M edges, X require vertices, and S found vertices (vertices along our path):
Compute All Pairs Shorest Paths (Floyd-Warshall, Johnson's, whatever)
for k in X
remove k from X, insert k into S
for v in (X + S) - Both sets
find the shortest distance from k to v - path P
for u in P (all vertices on the path)
insert u into S
if u exists in k, remove u from k
Now for the wall of text as to what this algorithm does. We pick a vertex k in X, and then find the minimum distance to the nearest other vertex in the target set X, or in the result set S, and call it v. Then we follow the path of nodes from {k,u}, inserting them into our result set. Finally, double check and make sure that any vertices in X that were on the path (shouldn't happen) are removed from X.
Any new vertex that you want to add, c, will have a minimum distance to some node already in your result set S. Since the nodes already in S are the minimum distance apart, it follows that c will be the minimum distance from any point in S to c. For example, if you have three nodes, A, B, and C, if A and B are already found to be a minimum distance apart, adding C fulfills the requirement that it is the minimum distance from B, and the minimum distance path from A to C goes through B.
I did some research on the discrete Steiner Tree problem (which is what this is), and this is the best brute force solution that I found. The main problem is going to be the O(n^3) time it takes to do all pairs shortest paths, but then the construction of the minimum tree should be straightforward and quick, since you just need to look up distance information. The implementation I wound up working with is outlined nicely on wikipedia.
