I am new in neo4j and trying to understand how can I optimize routing queries.
I am working with OSM db.
and I am trying to calclulate the distance from one point to another.
START a=node(760119)
MATCH path=(a)-[:NEXT|NODE*1..30]-(c)
WHERE HAS(c.node_osm_id) AND c.node_osm_id=283103898
RETURN DISTINCT reduce(
distance = 0, n in filter(
x in path where has(x.length)
) | distance + n.length
) AS distance order by distance
My query returns a set of distances.
319.5609607071325
320.0901127819706
321.64043860878735
332.13372820085
334.21320610250484
How can i rewrite the query, to stop looking for new paths if the distance is longer than the shortest.
Thanks in advance.
Cypher doesn't have support for shortest path with cost evaluation yet (as of 2.0-RC1). If you need to use a more efficient shortest path algorithm, you'll need to implement an unmanaged extension.
However, I do see where you might be able to improve your performance... have you tried adding C as a start point query?
Related
I'm new to Cypher and Neo4j, but I find it really interesting and are trying to use it to solve a math problem that I have. In order to make the problem easy to illustrate, I've scaled it down and hoping you can help me find the right logic.
The Math problem: Given a set of tiles, how many ways can you select 3 tiles, with the sum less than x?
In my example, let's just say that I have 5 tiles (100, 100, 80, 80, 50), and that I have to include at least one 100-tile, and that x is 270.
Since the order doesn't matter, the way I think about the problem is that I start at the highest nr, and then from there can choose to go to either the same nr again, or the next lower number, or the second lower number. This would mean, that starting at 100, I could choose to select either another 100, or 80 (the next lower one), or 50 (the second lower one).
So far, I'm able to define a path starting at 100 and going 2 steps further to m:
MATCH path = (n:Node {value:100})-[:CONNECTED*2]-(m)
QUESTION:
How do I find all paths with a specific sum of the nodes.value?
Since the order doesn't matter, I'm only interested in the unique one-way paths. (Meaning, for example that if I get one path as 100-80-50, then Im not interested in the path 50-80-100 since that contains the exact same tiles, just different order).
Thanks!
you means this?
MATCH path = (n:Node {value:100})-[:CONNECTED*2]-(m)
WITH REDUCE(x=0,n in nodes(path)|x+n.value) as expected, [n in nodes(path)|n.value] as listNode
WHERE expected >100
RETURN listNode
I'm trying to get all vertices up to a certain maximum cumulative weight (distance) from a specific node. The query
MATCH route = (p1:ReferencePlace) - [roadlist:EROAD*..5] - (p2:ReferencePlace)
WITH p1, p2, route, roadlist,
REDUCE(sum=0.0, road in roadlist | sum + toFloat(road.distance)) as totaldistance
WHERE totaldistance < 300
AND p1.name = "Paris"
RETURN p1, p2, totaldistance
produces the right output. This uses the E-road data example imported like here. It returns all places which are less than 300 km from Paris.
The problem is, that this only works for limited number of hops EROAD*..5]. For EROAD*] it "hangs" (I don't know whether it would finish in any reasonable amount of time). This is because Neo4j first finds all possible routes and then filters them. So it makes sense that the number of routes gets infeasibly large even for a small graph like this.
In theory it would be no big deal to implement a BFS algorithm from scratch which just gathers all relevant vertices as long as long as the cumulative distance is smaller than the threshold and only visits the relevant ones. But I'm wondering whether there's a Neo4j way of doing this.
Problem might be cycles in the graph, as suggested here.
I ended up using the algo.shortestPath.deltaStepping.stream algorithm
PROFILE MATCH (start:ReferencePlace {name:'Paris'})
CALL algo.shortestPath.deltaStepping.stream(start, 'distance', 3.0, {relationshipQuery: 'EROAD'})
YIELD nodeId, distance
WHERE distance <= 800
RETURN algo.asNode(nodeId) AS destination, distance
This seems to be doing what I want, even though conceptually it's not a straight-forward approach to my problem. Will see how it scales for larger graphs.
We are trying to find a way to create a full distance matrix in a neo4j database, where that distance is defined as the length of the shortest path between any two nodes. Of course, there is the shortestPath method but using a loop going through all pairs of nodes and calculating their shortestPaths get very slow. We are explicitely not talking about allShortestPaths, because that returns all shortest paths between 2 specific nodes.
Is there a specific method or approach that is fast for a large number of nodes (>30k)?
Thank you!
j.
There is no easier method; the full distance matrix will take a long time to build.
As you've described it, the full distance matrix must contain the shortest path between any two nodes, which means you will have to get that information at some point. Iterating over each pair of nodes and running a shortest-path algorithm is the only way to do this, and the complexity will be O(n) multiplied by the complexity of the algorithm.
But you can cut down on the runtime with a dynamic programming solution.
You could certainly leverage some dynamic programming methods to cut down on the calculation time. For instance, if you are trying to find the shortest path between (A) and (C), and have already calculated the shortest from (B) to (C), then if you happen to encounter (B) while pathfinding from (A), you do not need to recalculate the rest of the cost of that path; it is known.
However, creating a dynamic programming solution of any reasonable complexity will almost certainly be best done in a separate module for Neo4J that is thrown in into a plugin. If what you are doing is a one-time operation or an operation that won't be run frequently, it might be easier to just do the naive solution of calling shortestPath between each pair, but if you plan to be running it fairly frequently on dynamic data, it might be worth authoring a custom plugin. It totally depends on your needs.
No matter what, though, it will take some time to calculate. The dynamic programming solution will cut down on the time greatly (especially in a densely-connected graph), but it will still not be very fast.
What is the end game? Is this a one-time query that resets some property or creates new edges. Or a recurring frequent effort. If it's one-time, you might create edges between the two nodes at each step creating a transitive closure environment. The edge would point between the two nodes and have, as a property, the distance.
Thus, if the path is a>b>c>d, you would create the edges
a>b 1
a>c 2
a>d 3
b>c 1
b>d 2
c>d 1
The edges could be named distinctively to distinguish them from the original path edges. This could create circular paths, which may neither negate this strategy or need a constraint. if you are dealing with directed acyclic graphs it would work well.
I am struggling to find 1 efficient algorithm which will give me all possible paths between 2 nodes in a directed graph.
I found RGL gem, fastest so far in terms of calculations. I am able to find the shortest path using the Dijkstras Shortest Path Algorithm from the gem.
I googled, inspite of getting many solutions (ruby/non-ruby), either couldn't convert the code or the code is taking forever to calculate (inefficient).
I am here primarily if someone can suggest to find all paths using/tweaking various algorithms from RGL gem itself (if possible) or some other efficient way.
Input of directed graph can be an array of arrays..
[[1,2], [2,3], ..]
P.S. : Just to avoid negative votes/comments, unfortunately I don't have inefficient code snippet to show as I discarded it days ago and didn't save it anywhere for the record or reproduce here.
The main problem is that the number of paths between two nodes grows exponentially in the number of overall nodes. Thus any algorithm finding all paths between two nodes, will be very slow on larger graphs.
Example:
As an example imagine a grid of n x n nodes each connected to their 4 neighbors. Now you want to find all paths from the bottom left node to the top right node. Even when you only allow for moves to the right (r) and moves up (u) your resulting paths can be described by any string of length 2n with equal number of (r)'s and (u)'s. This will give you "2n choose n" number of possible paths (ignoring other moves and cycles)
I know it is possible to get the shortest path of minimum number of nodes by using Cypher and Gremlin? How about getting a path with minimum traversal cost? One of the example I can think of is the bus route. Some routes may have less bus stops (nodes) but need longer time (cost) to travel from one stop to another, some are reverse.
Is it possible to get the shortest path with minimum travel time by using Cypher or Gremlin?
See this other question for more on shortest paths. In answer to this specific question though, calculating the cost of a path, I first altered the toy graph to make it so that the weights from marko to josh to lop was cheaper than marko to lop:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.e(8).weight = 0.1f
==>0.1
gremlin> g.e(11).weight = 0.1f
==>0.1
Then to calculate the "cost" of the paths between marko and lop:
gremlin> g.v(1).outE.inV.loop(2){it.object.id!="3" && it.loops< 6 }.path.transform{[it.findAll{it instanceof Edge}.sum{it.weight}, it]}
==>[0.4, [v[1], e[9][1-created->3], v[3]]]
==>[0.20000000298023224, [v[1], e[8][1-knows->4], v[4], e[11][4-created->3], v[3]]]
So note that the the path length 3 through marko to josh to lop is cheaper than marko to lop. In any case, the gremlin basically says:
g.v(1).outE.inV.loop(2){it.object.id!="3" && it.loops< 6 }.path - grab the paths between marko and lop.
.transform{[it.findAll{it instanceof Edge}.sum{it.weight}, it]} - transform each path into a list where the first value is the sum of the weight properties and the second value is the path list itself. I calculate the total weight with a bit of groovy on the path list itself by finding all items in the path that are edges, then summing their weight values.
You can look at and prob use this one:
http://components.neo4j.org/neo4j-graph-algo/stable/apidocs/org/neo4j/graphalgo/GraphAlgoFactory.html#dijkstra(org.neo4j.graphdb.RelationshipExpander, org.neo4j.graphalgo.CostEvaluator)
Here are some tests showing other built in algos that you might be able to use.
https://github.com/neo4j/neo4j/tree/master/community/graph-algo/src/test/java/org/neo4j/graphalgo/shortestpath
To roll your own algo you can call the neo4j java api and even gremlin/groovy pipes with something like this:
http://neo4j-contrib.github.io/gremlin-plugin/#rest-api-send-an-arbitrary-groovy-script---lucene-sorting