Good Afternoon,
I'm currently planning a web-app/service project with a geolocation-enabled user model (lat/lng etc) and I was wondering what would be the best approach to find out the n biggest 'hot spots', e.g. geolocations with a given radius (e.g. 10 miles) where the most users are located at?
Does anyone know a good, practical clustering algorithm or other (existing) solution(s)? This is a pretty bird-view kind of question, I know... but backend technology wise I'm still open to anything as this particular feature is obviously only one of the whole feature set, but might help making a decision towards a particular set of tools/languages/environments.
Cheers & thanks,
-J
SQL Server's spatial data types would be worth a look. It allows you to index on the geography column and do queries for distance. Not sure how easy it would be to group by radius, but at least having the geography data type and building indexes on it should help a lot with this type of problem.
Geography Methods Supported by Spatial Indexes
Related
Lets say, my data set is a shopping mall.
I have to build a graph for it. Whenever asked, I have to generate a path (shortest path) from one shop to another.
Now my question is,
Is it efficient to build a graph of the whole building and generate
the path?
Or build a graph (something like a subgraph) between
only the 2 nodes and all its connectors (edges) when a user needs to
find the path?
I have to implement this for a mobile application where all the data is loaded from a server.
My current code builds the whole graph. But I want to use this as a library for future use.
If it is only for the current building, then it works fine.
But assuming that in the future another type of data set is used which is way too big that the current one, then which one of these methods is more efficient?
These are the only 2 ways I can think of implementing it. If there is any other solution then that would be highly appreciated!
Secondly, I am using Dijkstra's Algorithm for path finding, is that suitable for this kind of a case?
Any help would be highly appreciated,
Thanks.
Is it efficient to build a graph of the whole building and generate the path?
Or build a graph (something like a subgraph) between only the 2 nodes and all its connectors (edges) when a user needs to find the
path?
If the graph is known a priori, the most efficient solution, in regards to query times, will be to generate the whole graph and preprocess it. Then, you will query the contracted graph and have a very fast query time. Look for example at Contraction hierarchies, since it is one of the most widely used techniques. Otherwise, when the graph has to be built in runtime, I think it is what you mean with your second point, you could use A* or bidirectional Dijkstra. In the first one I guess the best heuristic you can come up is the straight line distance, so probably not very helpful.
Secondly, I am using Dijkstra's Algorithm for path finding, is that
suitable for this kind of a case?
Yes it is, but I would always use bidirectional Dijkstra, it's not difficult to implement and, generally, a great improvement in time requirements over unidirectional Djikstra. Some related questions in SO: 1, 2
Since Cassandra is based off of the Dynamo paper (distributed, self-balancing hash table) + BigTable and there are spatial indexes that would fit nicely into that paradigm (quadkey or geohash). Is there a reason that Geospatial support hasn't been implemented?
You could add a GeoPoint datatype as a tuple with an internal geohash and specify a CF as containing geo data. From there you can choose the behavior as having the geo data being a secondary index, or a denormalized SCF. That could lay the ground work for geospatial development and you could start by implementing some low hanging fruit such as .nearby() which could just return columns that share the same geohash. (I know that wouldn't give you the "nearest", you'd have to do a walk of surrounding geohashes or use a shape and a space filling curve for that which could be implemented later, but is a general operation for finding some nearby columns)
I know SimpleGeo/Urban Airship built geo support into Cassandra, but it doesn't look like that was ever opened up. Also, let me know if there's a better place to ask this (quora, mailing lists, etc...)
I think there are two parts to the answer.
The reason for why it's not there, is because nobody who commits code into Cassandra has thought of this feature, or thought that this capability is of high enough priority to spend major time on it. Most of the development in Cassandra is done by Datastax, and they, being a commercial entity, are privy to user demands and suggestions and also pretty pragmatic about what can give them the most ROI in terms of new features.
If there were a good enough third-party developer (or a team) with enough time on their hands, this could be done, and conceptually C* committers would likely have no problems about adding a major feature like this.
The second aspect is that Cassandra supports blobs (byte arrays), which means that what you're describing can be implemented in the client app/driver in a relatively straightforward manner. The drive would in that case be responsible for translating geo calls into appropriate raw byte operations. I'm also suspecting this would be less work than supporting a whole new data primitive with relevant set of operators in the core storage engine.
I have a request to develop an application that keep track of the movements of a certain item (or items). To better demonstrate what the application must do, I drew a diagram (simplified abstraction).
As I never worked with any databases other than the relational ones, I really don't know if I can solve this problem with a graph database.
These questions must be answered by the system:
What was the path that a certain pen drive walked?
I passed some pen drivers. Where are they now?
What are the pens I received, from where did they come from and to where did they go?
Where are the pens I burned and passed? And with whom?
Any help and suggestions are much appreciated.
Thanks
In Neo4j everything is either a node or a relationship. So it's useful to think: what would be my nodes and relationships?
Here it might be, for example, that every "pen drive, "person" and "location" is a node. Verbs like "walk" or "give" would be your relationships.
In this model, you'd be able use "Cypher" to query for things like "give me all location nodes connected to pen nodes by the relationship walk." Or, say "start at all person nodes and return nodes who have a give relationship to a pen drive node that doesn't have a give relationship that connects back to the starting person node."
This rich graph query language gives you nice algorithms like shortest distance for free, so you beyond a transactional record you could determine whether, for example, a pen drive made it from A to B using the optimal path. But as you can see above, "relational joins" do not beget simple queries or descriptions thereof.
When it comes to database design, when the model becomes cumbersome to map mentally, it's going to be a pain to develop too. Design your database based on how you plan to query it. If you're unable to easily explain those queries in terms of Neo4j, it's possible that Neo4j isn't going to be the best fit.
Sorry for the fairly open question but I was wondering whether anyone had any advice on the best way to create an app that searches for properties within a particular radius.
The best example of what I am looking to achieve is RightMove.
I was wondering what the best setup would be for adding city, town and postcode data and making it searchable.
I have been reading about Geocoder but was wondering whether this would be the best option for such an app or whether there are good alternatives. For example would you recommend storing all the location data in my own database or using an API to feed in this information.
Any advice or links people can offer really would be appreciated! Thanks.
The approach purely depends on the requirements and the availability of Geocoded data for the location for which you want the geocoded data.
Using Geocoder gives you an advantage that you don't have to bother about updating your Geo-database for a given Location. It has its own downside (request timeout, Data not available for a particular location, Licensing, Query limits etc), but they can be addressed.
If you are okay with storing the data in your DB, then you can achieve the same thing using Postgresql+PostGIS setup. PostGIS module gives you ability to do spatial querying in terms of Radius, checking if a given goe-point falls with-in a pre-defined polygon etc and since these are executed inside the DB, the performance is also very good. This approach has two advantages, you don't have to sign up for any service and no timeout errors. The downside of this approach is that you have to maintain/update the location data yourself.
I have done a handful of ROR projects with the second approach and its working fine for us quite well.
Hope this helps.
I am working in a delivery company. We currently solve 50+ locations routes by "hand".
I have been thinking about using Google Maps API to solve this problem, but I have read that there is a 24 points limit.
Currently we are using rails in our server so I am thinking about using a ruby script that would get the coordinates of the 50+ locations and output a reasonable solution.
What algorithm would you use to approach this problem?
Is Ruby a good programming language to solve this type of problem?
Do you know of any existing ruby script?
This might be what you are looking for:
Warning:
this site gets flagged by firefox as attack site - but I doesn't appear to be. In fact I used it before without a problem
[Check revision history for URL]
rubyquiz seems to be down ( has been down for a bit) however you can still check out WayBack machine and archive.org to see that page:
http://web.archive.org/web/20100105132957/http://rubyquiz.com/quiz142.html
Even with the DP solution mentioned in another answer, that's going to require O(10^15) operations. So you're going to have to look at approximate solutions, which are probably acceptable given that you currently do them by hand. Look at http://en.wikipedia.org/wiki/Travelling_salesman_problem#Heuristic_and_approximation_algorithms
Here are a couple of tricks:
1: Lump locations that are relatively close into one graph, and turn those locations into a single node in your main graph. This lets you be greedy without too much work.
2: Use an approximation algorithm.
2a: My favorite is bitonic tours. They're pretty easy to hack up.
See Update
Here's a py lib with a bitonic tour and here's another
Let me go look for a ruby one. I'm having trouble finding more than just the RGL, which has efficiency issues....
Update
In your case, the minimum spanning tree attack should be effective. I can't think of a case where your cities wouldn't meet the triangle inequality. This means that there should be a relatively sort of kind of almost fast rather decent approximation. Particularly if the distance is euclidean, which I think, again, it must be.
One of the optimized solution is using Dynamic Programming but still very expensive O(2**n), which is not very feasible, unless you use some clustering and distributing computing, ruby or single server won't be very useful for you.
I would recommend you to come up with a greedy criteria instead of using DP or brute force which would be easier to implement.
Once your program ends you can do some memoization, and store the results somewhere for later lookups, which can as well save you some cycles.
in terms of the code, you ll need to implement vertices, edges that have weights.
ie: vertex class which have edges with weights, recursive. than a graph class that will populate the data.
I worked on using meta-heurestic algorithms such as Ant Colony Optimazation to solve TSP problems for the Bays29 (29-city) problem, and it gave me close to optimal solutions in very short time. You can potentially use the same.
I wrote it in Java though, I will link it here anyways, because I am currently working on a port to ruby:
Java: https://github.com/mohammedri/ant_colony_java_TSP
Ruby: https://github.com/mohammedri/aco-ruby (incomplete)
This is the dataset it solves for: https://github.com/jorik041/osmsharp/blob/master/Core/OsmSharp.Tools/Benchmark/TSPLIB/Problems/TSP/bays29.tsp
Keep in mind I am using the Euclidean distance between each city i.e. the straight line distance, I don't think that is ideal in a real life situation considering roads and a city map etc. but it may be a good starting point :)
If you want the cost of the solution produced by the algorithm is within 3/2 of the optimum then you want the Christofides algorithm. ACO and GA don't have a guaranteed cost.