neo4j -- Find all shortest paths between more than 2 nodes - neo4j

For example,I want to query allShortestPaths between 3 nodes(A,B,C),it means i want to query:
1. the allShortestPaths between A and B
2. the allShortestPaths between C and B
3. the allShortestPaths between A and C
but I only find the allShortestPaths query to get allShortestPaths between two nodes.
As follow:
MATCH (node1:E { eid:"a9c2f114-796f-4934-a2d0-04bb3345e1d2" }),
(node2:E { eid:"01968dd2-1ed6-472d-82e9-be7701036b3b" }),
p = allShortestPaths((node1)-[*]-(node2))
RETURN p LIMIT 25
I am wondering if there exists a allShortestPaths query supporting more than 2 nodes input?
Now,to search 3 nodes,I have to invoke the "allShortestPaths" three times,as follow:
MATCH (node1:E { eid:"b73ade90-dfa1-4b94-bd0f-c16fd93bd680" }),
(node2:E { eid:"ddb5c52d-7002-4ac7-87d5-0f727f2ab3e7" }),
(node3:E { eid:"0398b081-6676-4a91-856b-abbabaee5e70" }) ,
p = allShortestPaths((node1)-[*]-(node2)),
q = allShortestPaths((node3)-[*]-(node2)),
m = allShortestPaths((node3)-[*]-(node1))
RETURN p,q,m LIMIT 10
What i want to do is to search allShortestPaths between arbitrary number of nodes.
So far,I intend to write user-defined procedures,but it will costs more time.I wondering who can provide better advice.
i want to search search allShortestPaths between serveral nodes.
such as: allShortestPaths((a)-[*]-(b)-[*]-(c)-[*]-(a))
I want get the all shortest path between a and b,b and c,c and a in a query

You need a nested loops:
// Array of id
WITH ["b73ade90-dfa1-4b94-bd0f-c16fd93bd680",
"ddb5c52d-7002-4ac7-87d5-0f727f2ab3e7",
"0398b081-6676-4a91-856b-abbabaee5e70"] as IDS
UNWIND IDS as vid
// Looking for the desired nodes
MATCH (N:E {id: vid})
WITH collect(N) as NS
// Nested loops
UNWIND RANGE(0, size(NS)-2) as i1
UNWIND RANGE(i1+1, size(NS)-1) as i2
WITH NS[i1] as N1,
NS[i2] as N2
// Get paths
MATCH ps = allShortestPaths((N1)-[*]-(N2))
RETURN ps

Neo4j doesn't provide a version of allShortestPaths taking multiple patterns, which is what you want:
allShortestPaths((node1)-[*]-(node2), (node1)-[*]-(node3), (node2)-[*]-(node3))
You wish to optimize the traversals by piggy-backing on the first one to do the second one at the same time, but there's no such thing out of the box, and it wouldn't do the third one either. It's a really specific use case.
You either have to call allShortestPaths n*(n-1) times (for n nodes) in Cypher, or try implementing it yourself server-side in a procedure using the Traversal framework.

here a sample cypher
MATCH (n:Entity) where n.name IN {names}
WITH collect(n) as nodes
UNWIND nodes as n
UNWIND nodes as m
WITH * WHERE id(n) < id(m)
MATCH path = allShortestPaths( (n)-[*..4]-(m) )
RETURN path
see https://neo4j.com/developer/kb/all-shortest-paths-between-set-of-nodes/ for more

Related

Cypher match path with intermediate nodes

I have the following graph with Stop (red) and Connection (green) nodes.
I want to find the shortest path from A to C using a cost property on Connection.
I would like to avoid making Connection a relationship because than I loose the CONTAINS relationship of Foo.
I can match a single hop like this
MATCH p=(:Stop {name:'A'})<-[:BEGINS_AT]-(:Connection)-[:ENDS_AT]->(:Stop {name:'B'}) RETURN p
but this does not work with an arbitrary number of Connections like it would with relationships and [*].
I also tried to make a projection down to simple relationships but it seems like I cannot do something with this without GDS.
MATCH (s1:Stop)<-[:BEGINS_AT]-(c:Connection)-[:ENDS_AT]->(s2:Stop) RETURN id(s1) AS source, id(s2) AS target, c.cost AS cost
Note that the connection is unidirectional, so it must not be possible to go from C to A.
Is there a way to do this without any Neo4j plugins?
This should get all usable paths (without plugins):
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=(a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop)
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p
To get the shortest path:
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=(a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop)
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p
ORDER BY LENGTH(p)
LIMIT 1;
or
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=shortestpath((a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop))
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p
If you want to calculate the weighted shortest path, then it is the easiest to use GDS or even APOC plugin. You could probably create a shortest weighted path function with cypher but it would be not optimized. I can only think of finding all paths between the two nodes and suming the weights. In the next step you would filter the path with the minimum sum of weight. This would not scale well though.
As for the second part of your question I would need more information as I dont know exactly what you want.

How Many Nodes Are Involved in a Match

How can I know how many nodes and edges are involved in a MATCH? Is there another way besides Explain / Profile Match?
If you mean how many nodes are matched in a path, such as a variable-length path, then you can assign a path variable for this:
MATCH p = (k:Person {name:'Keanu Reeves'})-[*..8]-(t:Person {name:'Tom Hanks'})
WITH p LIMIT 1
RETURN p, length(p) as pathLength, length(p) + 1 as numberOfNodesInPath
You can also use nodes(p) and relationships(p) to get the collection of nodes and relationships that make up the path, and you can use size() on those collections to get their size.
There exists the COUNT() function of Cypher that allows you to count the number of elements. As for example in this query:
MATCH (n)
RETURN COUNT(n);
This query will count all nodes in your database.
You can find more information in the cypher manual, under the aggregating functions. Check it out.
The following Cypher snippet should return the number of distinct nodes and relationships found by any given MATCH clause. Just replace <your code here> with your MATCH pattern.
MATCH <your code here>
WITH COLLECT(NODES(p)) AS ns, SUM(SIZE(RELATIONSHIPS(p))) AS relCount
UNWIND ns AS nodeList
UNWIND nodeList AS node
RETURN COUNT(DISTINCT node) AS nodeCount, relCount;

How to check if nodes are connect at all in Neo4j

I have a graph in Neo4j (first time using it) of about 10 different nodes that are connected in various ways. Not all nodes are connected to each other, as some have up to 6 or 7 neighbors, while some have only 1. What query would I write/use to check if a path exists from NodeA to NodeB? It doesn't have to be the shortest path, just if a path exists.
Along with this, is there a way to count who has the most or least neighbors? Thanks everyone for help in advance.
Return Foo nodes a and b if there is at least one path between them. (This variable-length path query with unbounded length could take a very long time or run out of memory if there are a lot of paths or very long paths).
MATCH (a:Foo {id: 'a'}), (b:Foo {id: 'b'})
WHERE (a)-[*]-(b)
RETURN a, b;
Return all paths between a and b. (This query could require even more time and memory than the previous query, since it will attempt to return all matching paths).
MATCH path=(a:Foo {id: 'a'})-[*]-(b:Foo {id: 'b'})
RETURN path;
Return the 10 nodes with the most neighbors, in descending order:
MATCH (n)--()
WITH n, COUNT(*) AS c
RETURN n
ORDER BY c DESC
LIMIT 10;

neo4j how to use count(distinct()) over the nodes of path

I search the longest path of my graph and I want to count the number of distinct nodes of this longest path.
I want to use count(distinct())
I tried two queries.
First is
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return nodes(p1)
The query result is a graph with the path nodes.
But if I tried the query
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return count(distinct(primero))
The result is
count(distinct(primero))
2
How can I use count(distinct()) over the node primero.
Node Primero has a field called id.
You should bind at least one of those nodes, add a direction and also consider a path-limit otherwise this is an extremely expensive query.
match p=(primero)-[:ResponseTo*..30]-(segundo)
with p order by length(p) desc limit 1
unwind nodes(p) as n
return distinct n;

Neo4j cypher query efficiency and syntax

I am attempting to query an ontology of health represented as an acyclic, directed graph in Neo4j v2.1.5. The database consists of 2 million nodes and 5 million edges/relationships. The following query identifies all nodes subsumed by a disease concept and caused by a particular bacteria or any of the bacteria subtypes as follows:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b.sctid, b.FSN
This query runs in < 1 second and returns the correct answers. However, adding one additional parameter adds substantial time (20 minutes). Example:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept),
t=(e:ObjectConcept{bacteria})<-[:ISA*]-(f:ObjectConcept),
WHERE NOT (b)-->()--(c)
AND NOT (b)-->()-->(d)
AND NOT (b)-->()-->(e)
AND NOT (b)-->()-->(f)
RETURN distinct b.sctid, b.FSN
I am new to cypher coding, but I have to imagine there is a better way to write this query to be more efficient. How would Collections improve this?
Thanks
I already answered that on the google group:
Hi Scott,
I presume you created indexes or constraints for :ObjectConcept(name) ?
I am working with an acyclic, directed graph (an ontology) that models
human health and am needing to identify certain diseases (example:
Pneumonia) that are infectious but NOT caused by certain bacteria
(staph or streptococcus). All concepts are Nodes defined as
ObjectConcepts. ObjectConcepts are connected by relationships such as
[ISA], [Pathological_process], [Causative_agent], etc.
The query requires:
a) Identification of all concepts subsumed by the concept Pneumonia as follows:
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)
this already returns a number of paths, potentially millions, can you check that with
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept) return count(*)
b) Identification of all concepts subsumed by Genus Staph and Genus Strep (including the concept Genus Staph and Genus Strep) as follows. Note:
with b MATCH (b) q = (c:ObjectConcept{Strep})<-[:ISA*]-(d:ObjectConcept), h = (e:ObjectConcept{Staph})<-[:ISA*]-(f:ObjectConcept)
this is then the cross product of the paths from "p", "q" and "h", e.g. if all 3 of them return 1000 paths, you're at 1bn paths !!
c) Identify all nodes(p) that do not have a causative agent of Strep (i.e., nodes(q)) or Staph (nodes(h)) as follows:
with b,c,d,e,f MATCH (b),(c),(d),(e),(f) WHERE (b)--()-->(c) OR (b)-->()-->(d) OR (b)-->()-->(e) OR (b)-->()-->(f) RETURN distinct b.Name;
you don't need the WITH or even the MATCH (b),(c),(d),(e),(f)
what connections are there between b and the other nodes ? do you have concrete ones? for the first there is also missing one direction.
the where clause can be a problem, in general you want to show that perhaps this query is better reproduced by a UNION of simpler matches
e.g
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(c:ObjectConcept{name:Strep}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(e:ObjectConcept{name:Staph}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Strep}) return b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Staph}) return b.name
another option would be to utilize the shortestPath() function to find one or all shortest path(s) between Pneumonia and the bacteria with certain rel-types and direction.
Perhaps you can share the dataset and the expected result.
The query was successfully accomplished using UNION functions as follows:
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
q = (c:ObjectConcept{sctid:58800005})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b
UNION
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
t = (e:ObjectConcept{sctid:65119002}) <-[:ISA*]- (f:ObjectConcept)
WHERE NOT (b)-->()-->(e) AND NOT (b)-->()-->(f)
RETURN distinct b
The query runs in sub 20 seconds vs. 20 minutes by reducing the cardinality of the objects being queried.

Resources