I am wondering how to use Neo4j to find the MST? Most examplesI found was using Hadoop to find it.
I don't think that this is possible in Cypher, given how current algorithms determine an MST (if I'm wrong on this, I'd love to know).
Instead, I'd recommend implementing one of the algorithms used for determining an MST, e.g. Prim's Algorithm. It's quite straight forward and, with the help of heaps and adjacency lists, is relatively performant.
A quick search for the algorithm will turn up many links.
I'm sure leveraging Neo4j's Core API or Traversal API might even help things integrate even more closely, possibly without needing to represent the entire graph as an adjacency list first. And of course you can do that with Neo4j in Embedded Mode or turn it into a Server Plugin in case you're running Neo4j in Server Mode.
Let us know what you come up with!
Related
I'm working with a large data set that really warrants a graph db. My goal is to visualize identify trends in the data set to make decisions.
I'm currently using neo4j and i really like the tool, however the nodes returned are capped at 300. This number is only a fraction of my data, and doesnt really allow me to gain the insight i've been looking for, even with queries to filter out portions. Additionally, I'd really like to add node weights and color per conditions, which isn't possible using just neo4j.
Has anybody found a solution to this problem. I'd imagine there may be some client side libraries designed for these sorts of problems. Alternatively, I wouldn't be opposed to switching to some other graph db better suited to solve these problems.
I would suggest using Neo4j Bloom. This will provide you better visualization of your Neo4j data.
Lets say, my data set is a shopping mall.
I have to build a graph for it. Whenever asked, I have to generate a path (shortest path) from one shop to another.
Now my question is,
Is it efficient to build a graph of the whole building and generate
the path?
Or build a graph (something like a subgraph) between
only the 2 nodes and all its connectors (edges) when a user needs to
find the path?
I have to implement this for a mobile application where all the data is loaded from a server.
My current code builds the whole graph. But I want to use this as a library for future use.
If it is only for the current building, then it works fine.
But assuming that in the future another type of data set is used which is way too big that the current one, then which one of these methods is more efficient?
These are the only 2 ways I can think of implementing it. If there is any other solution then that would be highly appreciated!
Secondly, I am using Dijkstra's Algorithm for path finding, is that suitable for this kind of a case?
Any help would be highly appreciated,
Thanks.
Is it efficient to build a graph of the whole building and generate the path?
Or build a graph (something like a subgraph) between only the 2 nodes and all its connectors (edges) when a user needs to find the
path?
If the graph is known a priori, the most efficient solution, in regards to query times, will be to generate the whole graph and preprocess it. Then, you will query the contracted graph and have a very fast query time. Look for example at Contraction hierarchies, since it is one of the most widely used techniques. Otherwise, when the graph has to be built in runtime, I think it is what you mean with your second point, you could use A* or bidirectional Dijkstra. In the first one I guess the best heuristic you can come up is the straight line distance, so probably not very helpful.
Secondly, I am using Dijkstra's Algorithm for path finding, is that
suitable for this kind of a case?
Yes it is, but I would always use bidirectional Dijkstra, it's not difficult to implement and, generally, a great improvement in time requirements over unidirectional Djikstra. Some related questions in SO: 1, 2
I would like to modify the way Cypher processes queries sent to it for pattern matching. I have read about Execution plans and how Cypher chooses the best plan with the least number of operations and all. This is pretty good. However I am looking into implementing a Similarity Search feature that allows you to specify a Query graph that would be matched if not exact, close (similar). I have seen a few examples of this in theory. I would like to implement something of this sort for Neo4j. Which I am guessing would require a change in how the Query Engine deals with queries sent to it. Or Worse :)
Here are some links that demonstrate the idea
http://www.cs.cmu.edu/~dchau/graphite/graphite.pdf
http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper72.pdf
I am looking for ideas. Anything at all in relation to the topic would be helpful. Thanks in advance
(:I)<-[:NEEDING_HELP_FROM]-(:YOU)
From my point of view, better for you is to create Unmanaged Extensions.
Because you can create you own custom functionality into Neo4j server.
You are not able to extend Cypher Language without your own fork of source code.
After taking part in a very interesting tutorial with a focus on Cypher, I was pleasantly surprised by the declarativeness of the Cypher query language. It's a very natural way of retrieving data from Neo4J in my opinion.
Before that, I had only used the native API. And while that is less declarative, you sort of get used to it after a while. The complex constructions are all very similar and vary only in the details for my specific project.
Still, Cypher looked more natural to me and so I am contemplating on building the second version of my application with mainly Cypher queries to interact with my database. But I encountered an issue.
There are numerous ways to convert my application into Cypher and after having tried several possible queries, all with the desired result, it appears even the fastest query is still about 20 times slower than the native API.
Now, I don't mind giving up some performance for declarativeness, but times 20 is a little bit to much for me in an application that's already struggling with performance. Is there a workaround for this issue, or do I just have to stick with the native API?
Your conclusion sounds very familiar to me. I've also had performance issues when I used Neo4j and Spring Data Neo4j together. In the parts where performance really mattered, I switched to the core Traversal API which right now is significantly faster than an average Cypher query. This has a lot to do with the fact that there's no processing of a query and the fact that you control every aspect of the traversal. Cypher can only guess what the most optimal strategy is. I'm convinced that it will gain speed in the (near) future, but if performance really matters, I'd say stick with the core API.
Also, If you would be using java and spring data neo4j, consider using the advanced mapping mode (AspectJ) which is a lot faster than the simple mapping mode.
I am working in a delivery company. We currently solve 50+ locations routes by "hand".
I have been thinking about using Google Maps API to solve this problem, but I have read that there is a 24 points limit.
Currently we are using rails in our server so I am thinking about using a ruby script that would get the coordinates of the 50+ locations and output a reasonable solution.
What algorithm would you use to approach this problem?
Is Ruby a good programming language to solve this type of problem?
Do you know of any existing ruby script?
This might be what you are looking for:
Warning:
this site gets flagged by firefox as attack site - but I doesn't appear to be. In fact I used it before without a problem
[Check revision history for URL]
rubyquiz seems to be down ( has been down for a bit) however you can still check out WayBack machine and archive.org to see that page:
http://web.archive.org/web/20100105132957/http://rubyquiz.com/quiz142.html
Even with the DP solution mentioned in another answer, that's going to require O(10^15) operations. So you're going to have to look at approximate solutions, which are probably acceptable given that you currently do them by hand. Look at http://en.wikipedia.org/wiki/Travelling_salesman_problem#Heuristic_and_approximation_algorithms
Here are a couple of tricks:
1: Lump locations that are relatively close into one graph, and turn those locations into a single node in your main graph. This lets you be greedy without too much work.
2: Use an approximation algorithm.
2a: My favorite is bitonic tours. They're pretty easy to hack up.
See Update
Here's a py lib with a bitonic tour and here's another
Let me go look for a ruby one. I'm having trouble finding more than just the RGL, which has efficiency issues....
Update
In your case, the minimum spanning tree attack should be effective. I can't think of a case where your cities wouldn't meet the triangle inequality. This means that there should be a relatively sort of kind of almost fast rather decent approximation. Particularly if the distance is euclidean, which I think, again, it must be.
One of the optimized solution is using Dynamic Programming but still very expensive O(2**n), which is not very feasible, unless you use some clustering and distributing computing, ruby or single server won't be very useful for you.
I would recommend you to come up with a greedy criteria instead of using DP or brute force which would be easier to implement.
Once your program ends you can do some memoization, and store the results somewhere for later lookups, which can as well save you some cycles.
in terms of the code, you ll need to implement vertices, edges that have weights.
ie: vertex class which have edges with weights, recursive. than a graph class that will populate the data.
I worked on using meta-heurestic algorithms such as Ant Colony Optimazation to solve TSP problems for the Bays29 (29-city) problem, and it gave me close to optimal solutions in very short time. You can potentially use the same.
I wrote it in Java though, I will link it here anyways, because I am currently working on a port to ruby:
Java: https://github.com/mohammedri/ant_colony_java_TSP
Ruby: https://github.com/mohammedri/aco-ruby (incomplete)
This is the dataset it solves for: https://github.com/jorik041/osmsharp/blob/master/Core/OsmSharp.Tools/Benchmark/TSPLIB/Problems/TSP/bays29.tsp
Keep in mind I am using the Euclidean distance between each city i.e. the straight line distance, I don't think that is ideal in a real life situation considering roads and a city map etc. but it may be a good starting point :)
If you want the cost of the solution produced by the algorithm is within 3/2 of the optimum then you want the Christofides algorithm. ACO and GA don't have a guaranteed cost.