I am working in a delivery company. We currently solve 50+ locations routes by "hand".
I have been thinking about using Google Maps API to solve this problem, but I have read that there is a 24 points limit.
Currently we are using rails in our server so I am thinking about using a ruby script that would get the coordinates of the 50+ locations and output a reasonable solution.
What algorithm would you use to approach this problem?
Is Ruby a good programming language to solve this type of problem?
Do you know of any existing ruby script?
This might be what you are looking for:
Warning:
this site gets flagged by firefox as attack site - but I doesn't appear to be. In fact I used it before without a problem
[Check revision history for URL]
rubyquiz seems to be down ( has been down for a bit) however you can still check out WayBack machine and archive.org to see that page:
http://web.archive.org/web/20100105132957/http://rubyquiz.com/quiz142.html
Even with the DP solution mentioned in another answer, that's going to require O(10^15) operations. So you're going to have to look at approximate solutions, which are probably acceptable given that you currently do them by hand. Look at http://en.wikipedia.org/wiki/Travelling_salesman_problem#Heuristic_and_approximation_algorithms
Here are a couple of tricks:
1: Lump locations that are relatively close into one graph, and turn those locations into a single node in your main graph. This lets you be greedy without too much work.
2: Use an approximation algorithm.
2a: My favorite is bitonic tours. They're pretty easy to hack up.
See Update
Here's a py lib with a bitonic tour and here's another
Let me go look for a ruby one. I'm having trouble finding more than just the RGL, which has efficiency issues....
Update
In your case, the minimum spanning tree attack should be effective. I can't think of a case where your cities wouldn't meet the triangle inequality. This means that there should be a relatively sort of kind of almost fast rather decent approximation. Particularly if the distance is euclidean, which I think, again, it must be.
One of the optimized solution is using Dynamic Programming but still very expensive O(2**n), which is not very feasible, unless you use some clustering and distributing computing, ruby or single server won't be very useful for you.
I would recommend you to come up with a greedy criteria instead of using DP or brute force which would be easier to implement.
Once your program ends you can do some memoization, and store the results somewhere for later lookups, which can as well save you some cycles.
in terms of the code, you ll need to implement vertices, edges that have weights.
ie: vertex class which have edges with weights, recursive. than a graph class that will populate the data.
I worked on using meta-heurestic algorithms such as Ant Colony Optimazation to solve TSP problems for the Bays29 (29-city) problem, and it gave me close to optimal solutions in very short time. You can potentially use the same.
I wrote it in Java though, I will link it here anyways, because I am currently working on a port to ruby:
Java: https://github.com/mohammedri/ant_colony_java_TSP
Ruby: https://github.com/mohammedri/aco-ruby (incomplete)
This is the dataset it solves for: https://github.com/jorik041/osmsharp/blob/master/Core/OsmSharp.Tools/Benchmark/TSPLIB/Problems/TSP/bays29.tsp
Keep in mind I am using the Euclidean distance between each city i.e. the straight line distance, I don't think that is ideal in a real life situation considering roads and a city map etc. but it may be a good starting point :)
If you want the cost of the solution produced by the algorithm is within 3/2 of the optimum then you want the Christofides algorithm. ACO and GA don't have a guaranteed cost.
Related
Lets say, my data set is a shopping mall.
I have to build a graph for it. Whenever asked, I have to generate a path (shortest path) from one shop to another.
Now my question is,
Is it efficient to build a graph of the whole building and generate
the path?
Or build a graph (something like a subgraph) between
only the 2 nodes and all its connectors (edges) when a user needs to
find the path?
I have to implement this for a mobile application where all the data is loaded from a server.
My current code builds the whole graph. But I want to use this as a library for future use.
If it is only for the current building, then it works fine.
But assuming that in the future another type of data set is used which is way too big that the current one, then which one of these methods is more efficient?
These are the only 2 ways I can think of implementing it. If there is any other solution then that would be highly appreciated!
Secondly, I am using Dijkstra's Algorithm for path finding, is that suitable for this kind of a case?
Any help would be highly appreciated,
Thanks.
Is it efficient to build a graph of the whole building and generate the path?
Or build a graph (something like a subgraph) between only the 2 nodes and all its connectors (edges) when a user needs to find the
path?
If the graph is known a priori, the most efficient solution, in regards to query times, will be to generate the whole graph and preprocess it. Then, you will query the contracted graph and have a very fast query time. Look for example at Contraction hierarchies, since it is one of the most widely used techniques. Otherwise, when the graph has to be built in runtime, I think it is what you mean with your second point, you could use A* or bidirectional Dijkstra. In the first one I guess the best heuristic you can come up is the straight line distance, so probably not very helpful.
Secondly, I am using Dijkstra's Algorithm for path finding, is that
suitable for this kind of a case?
Yes it is, but I would always use bidirectional Dijkstra, it's not difficult to implement and, generally, a great improvement in time requirements over unidirectional Djikstra. Some related questions in SO: 1, 2
I am wondering how to use Neo4j to find the MST? Most examplesI found was using Hadoop to find it.
I don't think that this is possible in Cypher, given how current algorithms determine an MST (if I'm wrong on this, I'd love to know).
Instead, I'd recommend implementing one of the algorithms used for determining an MST, e.g. Prim's Algorithm. It's quite straight forward and, with the help of heaps and adjacency lists, is relatively performant.
A quick search for the algorithm will turn up many links.
I'm sure leveraging Neo4j's Core API or Traversal API might even help things integrate even more closely, possibly without needing to represent the entire graph as an adjacency list first. And of course you can do that with Neo4j in Embedded Mode or turn it into a Server Plugin in case you're running Neo4j in Server Mode.
Let us know what you come up with!
I'm trying to come up with an implicit spell checker that will use the mappings of input words to some kind of more general phonetic representation to account for typos that might occur, basically for a search bar that will automatically correct your spelling to a degree. Two things that I've been looking into are metaphone, nysiis and soundex, but I don't really know which would be better for this application.
I would like there to be preferentially more matches than less matches, and I would like the matching to be a bit more general and so for that reason I was thinking of going with soundex which seems to be a more approximate mapping than the original metaphone, but I don't really know how large the difference in vagueness is. I know that nysiis is pretty similar to soundex, but I don't have a good idea of how similar they are, or how nysiis compares to metaphone.
I am also looking for the solution that is quickest to execute. I know that these phonetic mappers are usually pretty quick, but I'm not sure which would be fastest, considering I would like to be able to check spelling without an increase in search time, speed is a consideration. Thoughts?
I managed to find a wonderful article on this over here:
http://www.informit.com/articles/article.aspx?p=1848528
Not quite everything I was looking for, but a pretty large amount of it.
I'm working on a project to generate questions from sentences. Right now, I'm at a point where I can generate questions like:
"Angela Merkel is the chancelor of Germany." -> "Angela Merkel is who?"
Now, of course, I want the questions to look like "Who is...?" instead. Is there any easy way to do this that I haven't thought of yet?
My current idea would be to train an English(not quite question) -> English(question) translator, maybe using existing machine translation engines like moses. Is this overkill? How much data would I need? Are there corpora that address this or a similar problem? Is using a general translation engine even appropriate for this task?
Check out Michael Heilman's dissertation Automatic Factual Question Generation from Text for background on question generation and to see what his approach to this problem looks like. You can find more by searching for research on "question generation". He mentions a corpus from Microsoft: the Microsoft Research Question-Answering Corpus.
I don't think that an approach based solely on (current) statistical machine translation approaches is going to work that well, since you're usually going to need a deeper syntactic analysis of the source sentence to do a good job of generating an appropriate question. For simple questions like your example, it's pretty easy to design syntactic tree transformations to generate the question, but it gets much trickier as soon as the sentences get a little more complicated.
Off the top of my head, if you restrict yourself to relatively simple questions, you could do a parse, and then flip around the elements to get the question. How do you decide the question word though? Who, What, Where, Why... for this you'll need a classifier that looks at the elements of a sentence. Angela Merkel should be easy to classify as a person/name, so she gets s 'Who', Berlin should be in a dictionary of geos, so it gets 'Where'.
I'm not sure about specific software, but I'd probably do it with NLTK, using a dependency parse and then whatever classification scheme you feel like.
Ultimately your success depends on how big your input and output space is. I'd go for the absolute simplest possible problem first.
<tl;dr>
In source version control diff patch generation, would it be worth it to use the optimizations listed at the very bottom of this writing (see <optimizations>) in my Ruby implementation of diff for making diff patches?
</tl;dr>
<introduction>
I am programming something I have never done before and there might already be tools out there to do the exact thing I am programming but at this point I am having too much fun to care so I am still going to do it from scratch, even if there is a tool for this.
So anyways, I am working on a Ruby on Rails app and need a certain feature. Basically I want each entry in a table of mine, let's say for example a table of video games, to have a stored chunk of text that represents a review or something of the sort for that table entry. However, I want this text to be both editable by any registered user and also keep track of different submissions in a version control system. The simplest solution I could think of is just implement a solution that keeps track of the text body and the diff patch history of different versions of the text body as objects in Ruby and then serialize it, preferably in human readable form (so I'll most likely use YAML for this) for editing if needed due to corruption by a software bug or a mistake is made by an admin doing some version editing.
So at first I just tried to dive in head first into this feature to find that the problem of generating a diff patch is more difficult that I thought to do efficiently. So I did some research and came across some ideas. Some I have implemented already and some I have not. However, it all pretty much revolves around the longest common subsequence problem, as you would already know if you have already done anything with diff or diff-like features, and optimization the function that solves it.
Currently I have it so it truncates the compared versions of the text body from the beginning and end until non-matching lines are found. Then it solves the problem using a comparison matrix, but instead of incrementing the value stored in a cell when it finds a matching line like in most longest common subsequence algorithms I have seen examples of, I increment when I have a non-matching line so as to calculate edit distance instead of longest common subsequence. Although as far as I can tell between the two approaches, they are essentially two sides of the same coin so either could be used to derive an answer. It then back-traces through the comparison matrix and notes when there was an incrementation and in which adjacent cell (West, Northwest, or North) to determine that line's diff entry and assumes all other lines to be unchanged.
Normally I would leave it at that, but since this is going into a Rails environment and not just some stand-alone Ruby script, I started getting worried about needing to optimize at least enough so if a spammer that somehow knew how I implemented the version control system and knew my worst case scenario entry still wouldn't be able to hit the server that bad. After some searching and reading of research papers and articles through the internet, I've come across several that seem decent but all seem to have pros and cons and I am having a hard time deciding how well in this situation that the pros and cons balance out. So are the ones listed here worth it? I have listed them with known pros and cons.
</introduction>
<optimizations>
Chop the compared sequences into multiple subsequences by splitting where lines are unchanged, and then truncating each section of unchanged lines at the beginning and end of each section. Then solve the edit distance of each subsequence.
Pro: Changes the time increase as the changed area gets bigger from a quadratic
increase to something more similar to a linear increase.
Con: Figuring out where to split already seems like you have to solve edit distance
except now you don't care how it is changed. Would be fine if this was solvable by
a process closer to solving hamming distance but a single insertion would throw this
off.
Use a cryptographic hash function to both convert all sequence elements into integers and ensure uniqueness. Then solve the edit distance comparing the hash integers instead of the sequence elements themselves.
Pro: The operation of comparing two integers is faster than the operation of comparing
two strings, so a slight performance gain is received after every comparison, which
can be a lot overall.
Con: Using a cryptographic hash function takes time to convert all the sequence
elements and may end up costing more time to do the conversion that you gain back from
the integer comparisons. You could use the built in hash function for a string but
that will not guarantee uniqueness.
Use lazy evaluation to only calculate the three center-most diagonals of the comparison matrix and then only calculate additional diagonals as needed. And then also use this approach to possibly remove the need on some comparisons to compare all three adjacent cells as desribed here.
Pro: Can turn an algorithm that always takes O(n * m) time and make it so only worst
case scenario is that time, best case becomes practically linear, and average case is
somewhere between the two.
Con: It is an algorithm I've only seen implemented in functional programming languages
and I am having a difficult time comprehending how to convert this into Ruby based on
how it is described at the site linked to above.
Make a C module and do the hard work at the native level in C and just make a Ruby wrapper for it so Ruby can make all the calls to it that it needs.
Pro: I have to imagine that evaluating something like this in could be a LOT faster.
Con: I have no idea how Rails handles apps with ruby code that has C extensions and it
hurts the portability of the app.
This is an optimization for after the solving of edit distance, but idea is to store additional combined diffs with the ones produced by each version to make a delta-tree data structure with the most recently made diff as the root node of the tree so getting to any version takes worst case time of O(log n) instead of O(n).
Pro: Would make going back to an old version a lot faster.
Con: It would mean every new commit, the delta-tree would get a new root node that
will cost time to reorganize the delta-tree for an operation that will be carried out
a lot more often than going back a version, not to mention the unlikelihood it will be
an old version.
</optimizations>
So are these things worth the effort?
With regard to item 4 in your list, this seems to be ( from what I can tell ) how most gems work if there is any heavy lifting to be done by the code. Rails plays nice with the gem system, so you should find that if you need to incorporate this - probably alongside other optimisations you have suggested here - it should be fine, although you may need to recompile for different platforms.