TSP Where Vertices Can be Visited Multiple Times - graph-algorithm

I am looking to solve a problem where I have a weighted directed graph and I must start at the origin, visit all vertices at least once and return to the origin in the shortest path possible. Essentially this would be a classic example of TSP, except I DO NOT have the constraint that each vertex can only be visited once. In my case any vertex excluding the origin can be visited any number of times along the path, if this makes the path shorter. So for example in a graph containing the vertices V1, V2, V3 a path like this would be valid, given that it is the shortest path:
ORIGIN -> V1 -> V2 -> V1 -> V3 -> V1 -> ORIGIN
As a result, I am a bit stuck on what approach to take in order to solve this, as a classic dynamic programming algorithm approach which is usually used to solve TSP problems in exponential time is not suitable.

The typical approach is to create a distance matrix that gives the shortest-path distance between any two nodes. So d(i,j) = shortest path (following the edges of the network) from i to j. This can be done using Dijkstra's algorithm.
Now just solve a classical TSP with distances d(i,j). Your TSP doesn't "know" that the actual route followed might involve visiting a node multiple times. At the same time, it will ensure that the vehicle stops at every node.
Now, as for efficiency: As #Codor points out, TSP is NP-hard and so is your variant of it, so you are not going to find a provably optimal, polynomial-time algorithm. However, there are still many, many good algorithms (both heuristic and exact) for TSP, and most of them should be suitable for your problem. (In general, DP is not the way to go for TSP.)

To answer the question in part, the problem described in the question does not admit a polynomial-time algorithm unless P=NP by the following argument. Clearly, the proposed problem includes instances which are Euclidean. However, no optimal solution to a Euclidean instance has repeated nodes, as such a solution can be improved by deleting additional nodes, using the triangle inequality. However, according to the Wikipedia article on TSP, Euclidean TSP is still NP-hard. This means that any polynomial-time algorithm for the problem in the question would be able to solve the Euclidean TSP to optimality on polynomial time, which is impossible unless P=NP.

Related

Can I use Breadth-First-Search on weighted graphs if I modify it?

I am having a discussion with a friend if the following will work:
We recently learned in a lecture about Breadth-First-Search. I know that it is a special case of Dijkstra where each edge weight is set to one. Assume now we are given a graph where the edges have integer weights of more than one. Then I would modify this graph by introducing additional vertices and connecting them by edges with weight one, e.g. assume we have an edge of weight 3 connecting the vertices u and v, then I would introduce dummy-vertices d1, d2, remove the edge connecting u and v and instead add edges {u, d1}, {d1, d2}, {d2,v} of weight one.
If I modify my whole graph this way and then apply breadth-first search starting from one of the original vertices, wouldn't this work as well?
Thank you very much in advance!
Since BFS is guaranteed to return an optimal path on unweighted graphs, and you've created the unweighted equivalent of your original graph, you'll be guaranteed to get the shortest path.
What you lose by doing this over Dijkstra's algorithm is runtime optimality. Now the runtime of your algorithm is dependent on the edge weights, whereas Dijkstra's is only dependent on the number of edges.
This sort of thought experiment is a great way to understand how Dijkstra's algorithm works (eg. how would you modify your algorithm to not require creating a new graph? Or not take 100 steps for an edge with weight 100?). In fact this is probably how Dijkstra discovered the algorithm to begin with.

If a DBSCAN algorithm is working correctly, is it possible to result in a cluster with less than minPoints members?

I am new to using the DBSCAN algorithm.
Quick summary; it has two parameters:
epsilon - to specify the acceptable "distance" between two points, under which they can be considered close enough to cluster.
minPoints - to specify the minimum number of points that must fall with the distance epsilon to constitute a cluster. If there aren't enough points together, it's just labelled as noise.
I'm using somebody else's DBSCAN algorithm and I have the source code, which I only kind of understand. I was hoping I could use it as-is but then I found some behaviour I wasn't expecting.
I specified a value of 6 for minPoints and yet in my results I got a cluster with only 2 points.
From debugging, I think I can see what's happening. It looks like when the point is examined it has 16 neighbours at a distance below epsilon, so it should qualify as a cluster. Later on, the algorithm sees that 14 of those neighbours had already been assigned to a different cluster, so this cluster ends up with only 2 points in it.
Is minPoints supposed to be strictly enforced?
Is this how a healthy DBSCAN algorithm is supposed to work or is it a bug I need to fix before proceeding?
For anybody who finds this question through Google, the correct answer is Yes. It is possible for a correctly functioning DBSCAN implementation to create a cluster with less than minPoints members. It is explained more fully on this other Stack Overflow question:
Can the DBSCAN algorithm create a cluster with less than minPts?

Longest non repetitive path in a graph?

I was working on to find the shortest path between two nodes in undirected acyclic graph using Dijkstra's algorithms. I wanted to find the longest path that is possible by the same algorithm. I also want to avoid few routes with 0 edge values. How do I do that using Dijkstra's algorithm?
Now after searching through Stackoverflow I came across one given solution which just states that we need to modify the relaxation part to find the longest path.
Like:
if(distanceValueOfNodeA< EdgeValueofNodeBtoA )
{
distanceValueOfNodeA = EdgeValueofNodeBtoA;
}
But we are not considering adding distanceValueOfNodeB
But for shortest paths we calculate:
distanceValueOfNodeA = distanceValueOfNodeB+EdgeValueofNodeBtoA
Should we ignore distanceValueOfNodeB to calculate distanceValueOfNodeA ?
I am sorry to disappoint you but that problem is known as Longest path in a graph and there isn't an efficient algorithm to solve it, so niether Djikstra algorithm with any modification can.
It belongs to a class of problems known as NP-hard,those are problems for which there isn't (at the moment) an algorithm to solve them in faster time complexity compared to exponential.

Grouping points that represent lines

I am looking for an Algorithm that is able to solve this problem.
The problem:
I have the following set points:
I want to group the points that represents a line (with some epsilon) in one group.
So, the optimal output will be something like:
Some notes:
The point belong to one and only line.
If the point can be belong to two lines, it should belong to the strongest.
A line is considered stronger that another when it has more belonging points.
The algorithm should not cover all points because they may be outliers.
The space contains many outliers it may hit 50% of the the total space.
Performance is critical, Real-Time is a must.
The solutions I found till now:
1) Dealing with it as clustering problem:
The main drawback of this method is that there is no direct distance metric between points. The distance metric is on the cluster itself (how much it is linear). So, I can not use traditional clustering methods and I have to (as far as I thought) use some kind of, for example, clustering us genetic algorithm where the evaluation occurs on the while cluster not between two points. I also do not want to use something like Genetic Algorithm While I am aiming real-time solution.
2) accumulative pairs and then do clustering:
While It is hard to make clustering on points directly, I thought of extracting pairs of points and then try to cluster them with others. So, I have a distance between two pairs that can represents the linearity (two pairs are in real 4 points).
The draw-back of this method is how to choose these pairs? If I depend on the Ecledian-Distance between them, it may not be accurate because two points may be so near to each other but they are so far from making a line with others.
I appreciate any solution, suggest, clue or note. Please you may ask about any clarification.
P.S. You may use any ready OpenCV function in thinking of any solution.
As Micka advised, I used Sequential-RANSAC to solve my problem. Results were fantastic and exactly as I want.
The idea is simple:
Apply RANSAC with fit-line model on the points.
Delete all points that are in-liers of the output of RANSAC.
While there are 2 or more points go to 1.
I have implemented my own fit-line RANSAC but unfortnantly I can not share code because it belongs to the company I work for. However, there is an excellent fit-line RANSAC here on SO that was implemented by Srinath Sridhar. The link of the post is : RANSAC-like implementation for arbitrary 2D sets.
It is easy to make a Sequential-RANSAC depending on the 3 simple steps I mentioned above.
Here are some results:

Graph theory - learn cost function to find optimal path

This is a supervised learning problem.
I have a directed acyclic graph (DAG). Each edge has a vector of features X, and each node (vertex) has a label 0 or 1. The task is to find a cost function w(X), so that the shortest path between any pair of nodes has the highest ratio of 1s to 0s (minimum classification error).
The solution must generalize well. I tried logistic regression, and the learned logistic function predicts fairly well the label of a node giving the features of a incoming edge. However, the graph's topology is not taken into account by that approach, so the solution in the whole graph is non-optimal. In other words, the logistic function is not a good weight function given the problem setup above.
Although my problem setup is not the typical binary classification problem setup, here is a good intro to it:
http://en.wikipedia.org/wiki/Supervised_learning#How_supervised_learning_algorithms_work
Here are some more details:
Each feature vector X is a d-dimensional list of real numbers.
Each edge has a vector of features. That is, given the set of edges E = {e1, e2, .. en} and set of feature vectors F = {X1, X2 ... Xn}, then edge ei is associated to vector Xi.
It is possible to come up with a function f(X), so that f(Xi)
gives the likelihood that edge ei points to a node labeled with a 1.
An example of such function is the one I mentioned above, found through logistic
regression. However, as I mentioned above, such function is non-optimal.
SO THE QUESTION IS:
Given the graph, a starting node and an finish node, how do I learn the optimal cost function w(X), so that the ratio of nodes 1s to 0s is maximized (minimum classification error)?
This is not really an answer, but we need to clarify the question. I might come back later for a possible answer though.
Below is an example DAG.
Suppose the red node is the starting node, and the yellow one is the end node. How do you define the shortest path in terms of
the highest ratio of 1s to 0s (minimum classification error) ?
Edit: I add names for each node and two example names for the top two edges.
It seems to me you cannot learn such a cost function that takes feature vectors as inputs and whose output (edge weights? or whatever) can guide you to take a shortest path toward any node considering the graph topology. The reason is stated below:
Let's assume you don't have the feature vectors you stated. Given a graph as above, if you want to find all-pair-shortest-path with respective to the ratio of 1s to 0s, it's perfect to use Bellman equation or more specifically Dijkastra plus a proper heuristic function (e.g., percentage of 1s in the path). Another possible model-free approach is to use q-learning in which we get reward +1 for visiting a 1 node and -1 for visiting a 0 node. We learn a lookup q-table for each target node one at a time. Finally we have the all-pair-shortest-path when all nodes are treated as target nodes.
Now suppose, you magically obtained the feature vectors. Since you are able to find the optimal solution without those vectors, how come they will help when they exist?
There is one possible condition that you can use the feature vector to learn a cost function which optimize edge weights, that is, the feature vectors are dependent on the graph topology (the links between nodes and the position of 1s and 0s). But I did not see this dependency in your description at all. So I guess it does not exist.
This looks like a problem where a genetic algorithm has excellent potential. If you define the desired function as e.g. (but not limited to) a linear combination of the features (you could add quadratic terms, then cubic, ad inifititum), then the gene is the vector of coefficients. The mutator can be just a random offset of one or more coefficients within a reasonable range. The evaluation function is just the average ratio of 1's to 0's along shortest paths for all pairs according to the current mutation. At each generation, pick the best few genes as ancestors and mutate to form the next generation. Repeat until the ueber gene is at hand.
I believe your question is very close to the field of Inverse Reinforcement Learning, where you take certain "expert demonstrations" of optimal paths and try to learn a cost function such that your planner (A* or some reinforcement learning agent) outputs the same path as the expert demonstration. This training is done in an iterative way. I think that in your case, the expert demonstrations could be created by you to be paths that go through maximum number of 1 labelled edges. Here is a link to a good paper on the same: Learning to Search: Functional Gradient Techniques for Imitation Learning. It is from the robotics community where motion planning is usually setup as a graph-search problem and learning cost functions is essential for demonstrating desired behavior.

Resources