Will all TSP algorithms give the same optimum route? - a-star

I was just wondering if all algorithms for the TSP will give the same optimum routes? I thought that this would be the case but ive implemented branch and bound and A* and they both give very different results to the same input, I was just wondering if this is normal?

The route my differ, but the cost of all optimal solution should be the same.
If your A* solution is more expensive, than your heuristic is wrong.
Have a look at wikipedia A* algorithm for proofs that it always finds an optimal solution.

No. Provided more than one optimal route exists, different algorithms will not necesarily find the same path. It will depend on the implementation, and I assume it will also depend on how you label the graph, so that different labelings will make the same algorithm find different routes.

Related

Cell neighborhood analysis with relative distances in Cytoscape

Is it possible to show a neighborhood analysis with relative distances by defined scores in cytoscape? As an example: 2=not in the direct neighborhood, 1=in the direct neighborhood, 0=random permutation.
To take larger values is a very good and I will also implement this in my analysis. Unfortunately, I'm still unclear which method I should use for the visualization of the relative distances. According to what I had read the "force-directed" paradigm should be the best solution for my analysis. Do someone agree with that?
Thanks in advance and best regards Joschua
Sure, you could certainly do something like that. The real question is what would you want the network to look like? If you are trying to see some sort of clustering effect by using your relative distances, I would use different numbers -- maybe 100=not in the direct neighborhood, 1=in the direct neighborhood, and blank for random permutations. If you want to use your relative distances for something else (e.g. edge type or color), then the scaling shouldn't matter.
-- scooter

Comparison of two normal distribution

I have two normally distributed samples. I want to know how close or similar it is. I tried few methods to find the similarity, like z-score and bhattacharyya distance.
Bhattacharyya distance didn't work for me. It gives the same distance if the standard deviation of two samples is same. It doesn't change with change in mean.
I want to know whether any method is available that take the samples or its mean and standard deviation to find the similarity or similarity rank something like this.
I am not from mathematics background, so please ignore the terminology mistakes and let me know if any clarification is required.
I assume you're not looking for a relationship between the two samples, where a correlation coefficient would be appropriate?
I've been investigating a similar question for my current data and am looking at the Mahalanobis distance and the Earthmovers distance.
I found this post from a different forum which gave me a few ideas

Grouping points that represent lines

I am looking for an Algorithm that is able to solve this problem.
The problem:
I have the following set points:
I want to group the points that represents a line (with some epsilon) in one group.
So, the optimal output will be something like:
Some notes:
The point belong to one and only line.
If the point can be belong to two lines, it should belong to the strongest.
A line is considered stronger that another when it has more belonging points.
The algorithm should not cover all points because they may be outliers.
The space contains many outliers it may hit 50% of the the total space.
Performance is critical, Real-Time is a must.
The solutions I found till now:
1) Dealing with it as clustering problem:
The main drawback of this method is that there is no direct distance metric between points. The distance metric is on the cluster itself (how much it is linear). So, I can not use traditional clustering methods and I have to (as far as I thought) use some kind of, for example, clustering us genetic algorithm where the evaluation occurs on the while cluster not between two points. I also do not want to use something like Genetic Algorithm While I am aiming real-time solution.
2) accumulative pairs and then do clustering:
While It is hard to make clustering on points directly, I thought of extracting pairs of points and then try to cluster them with others. So, I have a distance between two pairs that can represents the linearity (two pairs are in real 4 points).
The draw-back of this method is how to choose these pairs? If I depend on the Ecledian-Distance between them, it may not be accurate because two points may be so near to each other but they are so far from making a line with others.
I appreciate any solution, suggest, clue or note. Please you may ask about any clarification.
P.S. You may use any ready OpenCV function in thinking of any solution.
As Micka advised, I used Sequential-RANSAC to solve my problem. Results were fantastic and exactly as I want.
The idea is simple:
Apply RANSAC with fit-line model on the points.
Delete all points that are in-liers of the output of RANSAC.
While there are 2 or more points go to 1.
I have implemented my own fit-line RANSAC but unfortnantly I can not share code because it belongs to the company I work for. However, there is an excellent fit-line RANSAC here on SO that was implemented by Srinath Sridhar. The link of the post is : RANSAC-like implementation for arbitrary 2D sets.
It is easy to make a Sequential-RANSAC depending on the 3 simple steps I mentioned above.
Here are some results:

A*: Finding a better solution for 15-square puzzle with one given solution

Given that there is a 15-square puzzle and we will solve the puzzle using a-star search. The heuristic function is Manhattan distance.
Now a solution is provided by someone with cost T and we are not sure if this solution is optimal. With this information provided,
Is it possible to find a better solution with cost < T?
Is it possible to optimize the performance of searching algorithm?
For this question, I have considered several approaches.
h(x) = MAX_INT if g(x) >= T. That is, the f(x) value will be maximum if the solution is larger than T.
Change the search node as CLOSED state if g(x) >= T.
Is it possible to find a better solution?
You need to know if T is the optimal solution. If you do not know the optimal solution, use the average cost; a good path is better than the average. If T is already better than average, you don't need to find a new path.
Is it possible to optimize the performance of the searching algorithm?
Yes. Heuristics are assumptions that help algorithms to make good decisions. The A* algorithm makes the following assumptions:
The best path costs the least (Djikstra's Algorithm - stay near origin of search)
The best path is the most direct path (Greedy Search - minimize distance to goal)
Good heuristics vastly improve performance (A* is useful for this reason). Bad heuristics lead the search away from good solutions and obliterate performance. My advice is to know the game you are searching; in chess, it's generally best to avoid losing a queen, so that may be a good heuristic to use.
Heuristics will have the largest impact on performance, especially in the case of a 15x15 search space. In larger search spaces (2000x2000), good use of high efficiency data structures like arrays and integers may improve performance.
Potential solutions
Both the solutions you provide are effectively the same; if the path isn't as good as the other paths you have, ignore them. Search algorithms like A* do this for you, as j_random_hacker has said in a roundabout manner.
The OPEN list is a set of possible moves; select the best and ignore the rest. The CLOSED list is the set of moves that have already been selected, not the ones you wish to ignore.
(1) d(x) = Djikstra's Algorithm
(2) g(x) = Greedy Search
(3) a*(x) = A* Algorithm = d(x) + g(x)
To make your A* more greedy (prefer suboptimal but fast solutions), multiply the cost of g(x) to favour a greedy search; (4) a*(x) = d(x) + 1.1 * g(x)
I actually tested this in to a search space of 1500x2000. (3), a standard A*, took about 5 seconds to find the goal on the opposite side. (4) took only milliseconds to find the goal, demonstrating the value of using heuristics well.
You may also add other heuristics to A*, such as:
Depth-first search (prefer a greater amount of moves)
Bread-first (prefer a smaller amount of moves)
Stick to Roads (if terrain determines movement speed, increase the cost of choosing bad terrain)
Stay out of enemy territory (if you want to avoid losing units, don't put them in harms way)

What algorithm would you use for clustering based on people attributes?

I'm pretty new in the field of machine learning (even if I find it extremely interesting), and I wanted to start a small project where I'd be able to apply some stuff.
Let's say I have a dataset of persons, where each person has N different attributes (only discrete values, each attribute can be pretty much anything).
I want to find clusters of people who exhibit the same behavior, i.e. who have a similar pattern in their attributes ("look-alikes").
How would you go about this? Any thoughts to get me started?
I was thinking about using PCA since we can have an arbitrary number of dimensions, that could be useful to reduce it. K-Means? I'm not sure in this case. Any ideas on what would be most adapted to this situation?
I do know how to code all those algorithms, but I'm truly missing some real world experience to know what to apply in which case.
K-means using the n-dimensional attribute vectors is a reasonable way to get started. You may want to play with your distance metric to see how it affects the results.
The first step to pretty much any clustering algorithm is to find a suitable distance function. Many algorithms such as DBSCAN can be parameterized with this distance function then (at least in a decent implementation. Some of course only support Euclidean distance ...).
So start with considering how to measure object similarity!
In my opinion you should also try expectation-maximization algorithm (also called EM). On the other hand, you must be careful while using PCA because this algorithm may reduce the dimensions relevant to clustering.

Resources