we have some data about some nodes that nodes connected to each other with relations we called them ( cable )
the number of nodes is : 349 and the number of cables is : 924
we need to find possible path ( not shortest ) between two nodes and used this :
MATCH p=(n:location)-[*]-(m:location)
WHERE n.lo_id = 70 AND m.lo_id = 486
AND ALL(x IN NODES(p) WHERE SINGLE(y IN NODES(p) WHERE y = x))
return p
but it's failed . i used to explain and saw in plan that in "VarLengthExpand(Into)#neo4j" about
67,837,845,872,747,150,000 estimated rows !!!
what's wrong with this query ?
i'm newbie with neo4j . should i put index on fields or rewrite query ?
would you please help me to make it work and find possible path with a good query between nodes ?
Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED.
It would be better to use APOC path expand config,
https://neo4j.com/labs/apoc/4.1/graph-querying/expand-paths-config/
set uniqueness to NODE_PATH
use bfs: false
use your end node as terminator node
possible set a max depth at least as long as you're testing
What do you want to do with all those billions of paths?
Basically path expansion is degree to the power of hops.
So for graph with average degree of 10, a 10 hop path would be 10^10.
I would really suggest to first figure out what you want to do with all the paths and then express better how you want to guide the expansion.
If you have business rules, you can use the traversal API from a user defined procedure to guide the traversal/expansion at each step. (See the implementation of that APOC procedure on GitHub)
Related
Neo4j version: 3.5.16
What kind of API / driver do you use: Python API with py2neo to run the query with graph.run()
Py2neo version: 4.3.0.
Hey all,
I'm trying to optimize a cypher query to retrieve a variable length path.
The graph is created each time data arrives and startNode and endNode are fixed on their name property. Once created the graph, I have a startNode and an endNode and the corolllary/objective is:
"From all the possible paths with a minimum length of X and a maximum length of Y, I want to find the shortest path that yields the highest aggregated relationship value".
What I actually have managed to do is: "get the path of length between X and Y that yields the highest aggregated relationship value" with the following cyper query:
MATCH path = (startN:Batch { name: $startNode })-[:CHANGES_TO*4..7]->(endN:Batch { name: $endNode })
RETURN path,
REDUCE (s = 1, r IN RELATIONSHIPS(path) | s * r.rateValue) AS finalBatchValue
ORDER BY finalBatchValue DESC
LIMIT 1
Howerver, it takes some time to run. ¿Could someone provide ideas on how to optimize this, both to accomplish the objetive of the shortestPath and to optimize the query for running faster if possible?
I tried to make it work with APOC methods like allShortestPaths or Dijkstra with no success; it ended up returning the shortest path and I wasn't able to fix the minimum amount of nodes to consider.
Any help is much appreciated.
Since you are basically looking for strongest one in shortest length. You can come around the performance problem with a trick.
You can query the graph with fixed length than variable length. But you have to query n2-n1+1 times , in your case , 4 times , first with length 4, and then 5 and so on .
You can stop querying if you find a path at any point.
This approach will tremendously decrease the data loaded each time. But you have to hit the graph multiple times.Its most likely that average time taken for four hits approach will be less than single hit with variable length.
The reason being you don't calculate all the paths of higher length if you find a lower length solution.
Since,longer the path gets, the time taken will grow exponentially.
This is not possible only using cypher . One way is writing neo4j procedure in java and using that in cypher query .
The second way is : hitting neo4j using different query.
i am writing python code for your case here ,
query = 'MATCH path = (startN:Batch { name: $startNode })-[:CHANGES_TO*LENGTH_PARAM]->(endN:Batch { name: $endNode })
RETURN path,
REDUCE (s = 1, r IN RELATIONSHIPS(path) | s * r.rateValue) AS finalBatchValue
ORDER BY finalBatchValue DESC
LIMIT 1'
for length in range(4,8):
query = query.replace('LENGTH_PARAM',str(x))
result = graph.run(query)
#if result size > 0
#your implementation
#final_result= result['path']
#return final_result
That's how it works, here in worst case,you need to hit graph four times for each start,end node pair . Network calls increase, but average time taken should be reduced.
With java plugins, it can be reduced to one hit like previous query , as you can do the loop part inside the java code .
I want to optimize cypher because its too slow to get the result.
My code is :
MATCH (e0{name:"dacomitinib"})-[r01]-(e1)-[r12]-(e2)-[r23]-(e3{name:"rucaparib camsylate"})
WHERE (e1:GeneEntity or e1:CompoundEntity or e1:DrugsEntity or e1:DiseaseEntity or e1:ProteinEntity)
and (e2:GeneEntity or e2:CompoundEntity or e2:DrugsEntity or e2:DiseaseEntity or e2:ProteinEntity)
RETURN e0.name,r01.confidence,e1.name,r12.confidence,e2.name,r23.confidence,e3.name
What should I do?
update one:
The PROFILE of my code is
Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 86876729 total db hits in 53454 ms.
There some ways you can improve the performance of your query.
1. Create Index on name property:
Do the same for the other labels as well.
CREATE INDEX ON :GeneEntity(name)
2. Use labels when matching (Here for e0 and e3): Consider using labels for reducing the nodes to scan. If you don't use labels Neo4j will compare all the nodes.
Solution:
Your query is internally resulting in an AllNodesScan.
AllNodesScan this is a bad Idea!.
A better solution could be:
MATCH (e0{name:"dacomitinib"}), (e3{name:"rucaparib camsylate"})
WITH e0, e3
MATCH (e0)-[r01]-(e1)-[r12]-(e2)-[r23]-(e3)
WHERE
head(labels(e1)) IN ['GeneEntity','CompoundEntity','DrugsEntity','DiseaseEntity','ProteinEntity']
AND
head(labels(e2)) IN ['GeneEntity','CompoundEntity','DrugsEntity','DiseaseEntity','ProteinEntity']
RETURN e0.name, r01.confidence, e1.name, r12.confidence, e2.name, r23.confidence, e3.name
I want to look up the top 5 (shortest) path in my graph (Neo4j 3.0.4) from point A to point Z.
The graph consists several nodes that are connected by the relation "CONNECTED_BY". This connection has a cost property that should be minimized.
I started with this:
MATCH p=(from:Stop{stopId:'A'}), (to:Stop{stopUri:'Z'}),
path = allShortestPaths((from)-[:CONNECTED_TO*]->(to))
WITH REDUCE (total = 0, r in relationships(p) | total + r.cost) as tt, path
RETURN path, tt
This query returns always the subgraph with the least hops, the cost property is not considered. There exists another subgraph with more hops that has a lower total cost. What I am doing wrong?
Furthermore, I acutally want to get the TOP 5 subgraphs. If I execute this query:
MATCH p=(from:Stop{stopUri:'A'})-[r:CONNECTED_TO*10]->(to:Stop{stopUri:'Z'}) RETURN p
I can see several paths, but the first one just returns one path.
The path should not contain loops etc. of course.
I want to execute this query via REST API, so a REST Call or cyhper query should do it.
EDIT1:
I want to execute this as REST Call, so I tried the dijkstra algorithm. This seems to be a good way, but I have to calculate the weight by adding 3 different cost properties in the relation. How this could be achieved?
allShortestPaths will find the shortest path between two points and then match every path that has the same number of hops. If you want to minimize based on cost rather than traversal length, try something like this:
MATCH p=(from:Stop{stopId:'A'}), (to:Stop{stopUri:'Z'}),
path = (from)-[:CONNECTED_TO*]->(to)
WITH REDUCE (total = 0, r in relationships(p) | total + r.cost) as cost, path
ORDER BY cost
RETURN path LIMIT 5
I was looking for the feature to generate some graph queries in neo4j.
As the database size is huge so can anyone suggest the procedure to generate small queries (3-5 nodes a -> b -> c ->a).
I can run BFS from a node but how can I find the small graph containing only a specific number of nodes as graph structure?
a
/ \
b-----c----d
[UPDATED]
If you want to get a single arbitrary path of length 4 (having 4 relationships and 5 nodes), and you do not need the path to be unidirectional, then you can simply do this:
MATCH p=()-[*4]-()
RETURN p
LIMIT 1;
If you want the path to be unidirectional (where all relationships point in the same direction), then you just need to specify a direction:
MATCH p=()-[*4]->()
RETURN p
LIMIT 1;
I'm working with a graph that has thousands of nodes. Say I have person nodes, and FRIENDS relationships between them. e.g., gus-[:FRIENDS]-skylar
If I wanted to find the shortest friend path between hank and gus as long as they're not separated by more than 20 rels, I could do this:
START hank=node(68), gus=node(66)
MATCH p = shortestPath((hank)-[:FRIENDS*..20]-(gus))
RETURN p
This works and is fast, even when the found shortest path is of length 10 or more.
But say I wanted to find a path from hank to gus that does not go through glenn?
The query I've tried is this:
START hank=node(68), gus=node(66), glenn=node(59)
MATCH p =(hank)-[:FRIENDS*..20]-(gus)
WHERE NOT glenn IN nodes(p)
RETURN p
ORDER BY length(p)
LIMIT 1;
This works on very small graphs (30 or so people), but if there are 1000's...the JVM runs out of heapspace.
So I'm guessing Cypher finds ALL paths between gus and hank of length 20 or less, and then applies the WHERE filter? It's clear why that would be slow.
In an abstract sense, this algorithm should be doable with the same big O runtime, because all that would change is that you check to make sure each node (as you search) isn't the one you want to avoid.
Any suggestions for how to accomplish this? I'm pretty new to Cypher.
If this is not possible with Cypher, can you recommend some other database and graph language "stack"?
Thanks
Can you test the performance of the following query? The main difference is that it compares paths instead of nodes. I've added a direction in the paths as well, as that will speed up the query.
START hank=node(68), gus=node(66), glenn=node(59)
MATCH p = allshortestPaths((hank)-[:FRIENDS]->(gus))
WITH COLLECT(p) AS gusPaths, hank, glenn
MATCH p2 = allshortestPaths((hank)-[:FRIENDS]->(glenn))
WITH COLLECT(p2) AS glennPaths, gusPaths
WITH filter(x IN gusPaths
WHERE NONE (x2 IN glennPaths
WHERE x = x2)) AS filtered
RETURN filtered
ORDER BY length(filtered)
LIMIT 1