I want to look up the top 5 (shortest) path in my graph (Neo4j 3.0.4) from point A to point Z.
The graph consists several nodes that are connected by the relation "CONNECTED_BY". This connection has a cost property that should be minimized.
I started with this:
MATCH p=(from:Stop{stopId:'A'}), (to:Stop{stopUri:'Z'}),
path = allShortestPaths((from)-[:CONNECTED_TO*]->(to))
WITH REDUCE (total = 0, r in relationships(p) | total + r.cost) as tt, path
RETURN path, tt
This query returns always the subgraph with the least hops, the cost property is not considered. There exists another subgraph with more hops that has a lower total cost. What I am doing wrong?
Furthermore, I acutally want to get the TOP 5 subgraphs. If I execute this query:
MATCH p=(from:Stop{stopUri:'A'})-[r:CONNECTED_TO*10]->(to:Stop{stopUri:'Z'}) RETURN p
I can see several paths, but the first one just returns one path.
The path should not contain loops etc. of course.
I want to execute this query via REST API, so a REST Call or cyhper query should do it.
EDIT1:
I want to execute this as REST Call, so I tried the dijkstra algorithm. This seems to be a good way, but I have to calculate the weight by adding 3 different cost properties in the relation. How this could be achieved?
allShortestPaths will find the shortest path between two points and then match every path that has the same number of hops. If you want to minimize based on cost rather than traversal length, try something like this:
MATCH p=(from:Stop{stopId:'A'}), (to:Stop{stopUri:'Z'}),
path = (from)-[:CONNECTED_TO*]->(to)
WITH REDUCE (total = 0, r in relationships(p) | total + r.cost) as cost, path
ORDER BY cost
RETURN path LIMIT 5
Related
we have some data about some nodes that nodes connected to each other with relations we called them ( cable )
the number of nodes is : 349 and the number of cables is : 924
we need to find possible path ( not shortest ) between two nodes and used this :
MATCH p=(n:location)-[*]-(m:location)
WHERE n.lo_id = 70 AND m.lo_id = 486
AND ALL(x IN NODES(p) WHERE SINGLE(y IN NODES(p) WHERE y = x))
return p
but it's failed . i used to explain and saw in plan that in "VarLengthExpand(Into)#neo4j" about
67,837,845,872,747,150,000 estimated rows !!!
what's wrong with this query ?
i'm newbie with neo4j . should i put index on fields or rewrite query ?
would you please help me to make it work and find possible path with a good query between nodes ?
Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED.
It would be better to use APOC path expand config,
https://neo4j.com/labs/apoc/4.1/graph-querying/expand-paths-config/
set uniqueness to NODE_PATH
use bfs: false
use your end node as terminator node
possible set a max depth at least as long as you're testing
What do you want to do with all those billions of paths?
Basically path expansion is degree to the power of hops.
So for graph with average degree of 10, a 10 hop path would be 10^10.
I would really suggest to first figure out what you want to do with all the paths and then express better how you want to guide the expansion.
If you have business rules, you can use the traversal API from a user defined procedure to guide the traversal/expansion at each step. (See the implementation of that APOC procedure on GitHub)
Neo4j version: 3.5.16
What kind of API / driver do you use: Python API with py2neo to run the query with graph.run()
Py2neo version: 4.3.0.
Hey all,
I'm trying to optimize a cypher query to retrieve a variable length path.
The graph is created each time data arrives and startNode and endNode are fixed on their name property. Once created the graph, I have a startNode and an endNode and the corolllary/objective is:
"From all the possible paths with a minimum length of X and a maximum length of Y, I want to find the shortest path that yields the highest aggregated relationship value".
What I actually have managed to do is: "get the path of length between X and Y that yields the highest aggregated relationship value" with the following cyper query:
MATCH path = (startN:Batch { name: $startNode })-[:CHANGES_TO*4..7]->(endN:Batch { name: $endNode })
RETURN path,
REDUCE (s = 1, r IN RELATIONSHIPS(path) | s * r.rateValue) AS finalBatchValue
ORDER BY finalBatchValue DESC
LIMIT 1
Howerver, it takes some time to run. ¿Could someone provide ideas on how to optimize this, both to accomplish the objetive of the shortestPath and to optimize the query for running faster if possible?
I tried to make it work with APOC methods like allShortestPaths or Dijkstra with no success; it ended up returning the shortest path and I wasn't able to fix the minimum amount of nodes to consider.
Any help is much appreciated.
Since you are basically looking for strongest one in shortest length. You can come around the performance problem with a trick.
You can query the graph with fixed length than variable length. But you have to query n2-n1+1 times , in your case , 4 times , first with length 4, and then 5 and so on .
You can stop querying if you find a path at any point.
This approach will tremendously decrease the data loaded each time. But you have to hit the graph multiple times.Its most likely that average time taken for four hits approach will be less than single hit with variable length.
The reason being you don't calculate all the paths of higher length if you find a lower length solution.
Since,longer the path gets, the time taken will grow exponentially.
This is not possible only using cypher . One way is writing neo4j procedure in java and using that in cypher query .
The second way is : hitting neo4j using different query.
i am writing python code for your case here ,
query = 'MATCH path = (startN:Batch { name: $startNode })-[:CHANGES_TO*LENGTH_PARAM]->(endN:Batch { name: $endNode })
RETURN path,
REDUCE (s = 1, r IN RELATIONSHIPS(path) | s * r.rateValue) AS finalBatchValue
ORDER BY finalBatchValue DESC
LIMIT 1'
for length in range(4,8):
query = query.replace('LENGTH_PARAM',str(x))
result = graph.run(query)
#if result size > 0
#your implementation
#final_result= result['path']
#return final_result
That's how it works, here in worst case,you need to hit graph four times for each start,end node pair . Network calls increase, but average time taken should be reduced.
With java plugins, it can be reduced to one hit like previous query , as you can do the loop part inside the java code .
I want to find the shortest path between two nodes. The path itself is not the problem... The bigger problem is, that I´ll want to return the path, where the aggregated relationship property on the path is highest.
For better understanding, here´s what I want:
This is my query
MATCH
(startNode:Person {id:"887111"}),
(endNode:Person {id:"789321"}),
paths = allShortestPaths((startNode)-[r:KNOWS *..20]-(endNode))
RETURN paths
In this example I´ll want to have the path from Elissa (id: 887111) to Kasey (id: 789321) where the aggregated count ON the relationship is MAX.
I´ve also had a look at 'shortestPath', which only gives me one path. The other way is to call the 'dijkstra'-algo, with this I´ll get only the path with the lowest 'cost' (and not the highest).
So in my example the only path which should shown up is Elissa->Travon->Kasey
I´d think, the problem isn´t that complex, but at the moment I´m gettin stucked with this..
Thanks so far in advance.
UPDATE
after calling the suggested query
MATCH (startNode:Person {id:"789321"}), (endNode:Person {id:"887111"})
CALL apoc.algo.dijkstra(startNode, endNode, 'KNOWS', '_duration') YIELD path, weight
RETURN path, -weight AS weight
my result is the following
[UPDATED]
I present 2 answers, depending one what you are trying to do.
1. Finding path with max total weight
To find the path with max total weight, you can feed to the Dijkstra algorithm the negation of the original weight properties. The resulting "lowest" total weight will be a negative value that, when negated, will actually be the highest total weight (based on the original weight properties).
There is an APOC procedure, apoc.algo.dijkstra that implements the Dijkstra algorithm, but it does not allow you to use the negative value of the specified weight property. So, to use that procedure, you would need to add a new property to each KNOWS relationship with the appropriate negative value. For example, to add the negative weights to existing relationships (assuming w is the original weight property, and _w will contain the corresponding negative value):
MATCH ()-[k:KNOWS]->()
SET k._w = -k.w;
Once you have the negative weights, the following should give you the path with the max weight:
MATCH (startNode:Person {id:"887111"}), (endNode:Person {id:"789321"})
CALL apoc.algo.dijkstra(startNode, endNode, 'KNOWS', '_w') YIELD path, weight
RETURN path, -weight AS weight;
2. Choosing from the shortest paths the one with maximum total weight
MATCH
(startNode:Person {id:"887111"}),
(endNode:Person {id:"789321"}),
path = allShortestPaths((startNode)-[:KNOWS *..20]-(endNode))
RETURN path, REDUCE(s = 0, r IN RELATIONSHIPS(path) | s + r.duration) AS weight
ORDER BY weight DESC
LIMIT 1;
I created a Neo4j database with the cypher statement here:https://gist.github.com/neoecos/8748091
i want to know : how to get :
1.less Transfer Paths (order by transfer)
2.the Shortest Path (order by path length)
3.the optimal Path (less Transfer and the Shortest Path)
please give the corresponding query.
And do you think that is the best way to create a Bus inquiry system?
Thanks a lot.
The shortest path is pretty easy:
MATCH path=shortestPath((station_44:STATION {id:44})-[*0..10]-(station_46:STATION {id:46}))
RETURN path
As far as counting transfers you can do something like this:
MATCH path=allShortestPaths((station_44:STATION {id:44})-[rels*0..10]-(station_46:STATION {id:46}))
RETURN length(path) AS stop_count, length(FILTER(index IN RANGE(1, length(rels)-1) WHERE (rels[index]).bus <> (rels[index - 1]).bus)) AS transfer_count
Once you have those two variables you can calculate / sort however you like. For example:
MATCH path=(station_44:STATION {id:44})-[rels*0..4]-(station_46:STATION {id:46})
WITH length(path) AS stop_count, length(FILTER(index IN RANGE(1, length(rels)-1) WHERE (rels[index]).bus <> (rels[index - 1]).bus)) AS transfer_count
RETURN stop_count, transfer_count
ORDER BY (stop_count * 0.5) + (transfer_count * 2.0) DESC
Here I removed the allShortestPaths call so that you get different lengths of paths. The ORDER BY uses weights on the two metrics. Unfortunately, at least in my DB, if you go beyond a path length of four it starts to get really slow. You might be able to improve that by introducing a direction arrow in the path, if that makes sense in your case.
I am creating simple graph db for tranportation between few cities. My structure is:
Station = physical station
Stop = each station has several stops, depend on time and line ID
Ride = connection between stops
I need to find route from city A to city C, but i has no direct stopconnection, but they are connected thru city B. see picture please, as new user i cant post images to question.
How can I get router from City A with STOP 1 connect RIDE 1 to STOP 2 then
STOP 2 connected by same City B to STOP3 and finnaly from STOP3 by RIDE2 to STOP4 (City C)?
Thank you.
UPDATE
Solution from Vince is ok, but I need set filter to STOP nodes for departure time, something like
MATCH p=shortestPath((a:City {name:'A'})-[*{departuretime>xxx}]-(c:City {name:'C'})) RETURN p
Is possible to do without iterations all matches collection? because its to slow.
If you are simply looking for a single route between two nodes, this Cypher query will return the shortest path between two City nodes, A and C.
MATCH p=shortestPath((a:City {name:'A'})-[*]-(c:City {name:'C'})) RETURN p
In general if you have a lot of potential paths in your graph, you should limit the search depth appropriately:
MATCH p=shortestPath((a:City {name:'A'})-[*..4]-(c:City {name:'C'})) RETURN p
If you want to return all possible paths you can omit the shortestPath clause:
MATCH p=(a:City {name:'A'})-[*]-(c:City) {name:'C'}) RETURN p
The same caveats apply. See the Neo4j documentation for full details
Update
After your subsequent comment.
I'm not sure what the exact purpose of the time property is here, but it seems as if you actually want to create the shortest weighted path between two nodes, based on some minimum time cost. This is different of course to shortestPath, because that minimises on the number of edges traversed only, not the cost of those edges.
You'd normally model the traversal cost on edges, rather than nodes, but your graph has time only on the STOP nodes (and not for example on the RIDE edges, or the CITY nodes). To make a shortest weighted path query work here, we'd need to also model time as a property on all nodes and edges. If you make this change, and set the value to 0 for all nodes / edges where it isn't relevant then the following Cypher query does what I think you need.
MATCH p=(a:City {name: 'A'})-[*]-(c:City {name:'C'})
RETURN p AS shortestPath,
reduce(time=0, n in nodes(p) | time + n.time) AS m,
reduce(time=0, r in relationships(p) | time + r.time) as n
ORDER BY m + n ASC
LIMIT 1
In your example graph this produces a least cost path between A and C:
(A)->(STOP1)-(STOP2)->(B)->(STOP5)->(STOP6)->(C)
with a minimum time cost of 230.
This path includes two stops you have designated "bad", though I don't really understand why they're bad, because their traversal costs are less than other stops that are not "bad".
Or, use Dijkstra
This simple Cypher will probably not be performant on densely connected graphs. If you find that performance is a problem, you should use the REST API and the path endpoint of your source node, and request a shortest weighted path to the target node using Dijkstra's algorithm. Details here
Ah ok, if the requirement is to find paths through the graph where the departure time at every stop is no earlier than the departure time of the previous stop, this should work:
MATCH p=(:City {name:'A'})-[*]-(:City {name:'C'})
MATCH (a:Stop) where a in nodes(p)
MATCH (b:Stop) where b in nodes(p)
WITH p, a, b order by b.time
WITH p as ps, collect(distinct a) as as, collect(distinct b) as bs
WHERE as = bs
WITH ps, last(as).time - head(as).time as elapsed
RETURN ps, elapsed ORDER BY elapsed ASC
This query works by matching every possible path, and then collecting all the stops on each matched path twice over. One of these collections of stops is ordered by departure time, while the other is not. Only if the two collections are equal (i.e. number and order) is the path admitted to the results. This step evicts invalid routes. Finally, the paths themselves are ordered by least elapsed time between the first and last stop, so the quickest route is first in the list.
Normal warnings about performance, etc. apply :)