Neo4j - shortestPath through node - neo4j

I am trying to get the shortest path between node (a) and node (c) through a particular node (b) that has the label SomeImportantLabel. Drawn, this is what I want:
(a)-(?..)-(b:SomeImportantLabel)-(?..)-(c)
Note that (?..) means that there might be 'n' number of nodes in between.
Something like this would be the deal I am looking for:
match p = allShortestPaths((a)-[*]-(b:SomeImportantLabel)-[*]-(c))
where id(a) = 123 and id(c) = 456
return nodes(p) as nodes, relationships(p) as rels;
Since it is not possible to have multiple relations in a shortestPath/allShortestPaths function, I have read here on SO that you would have to do it this way:
match p1 = allShortestPaths((a)-[*]-(b:SomeImportantLabel)), p2=allShortestPaths((b:SomeImportantLabel)-[*]-(b))
where id(a) = 123 and id(c) = 456
return nodes(p1)+nodes(p2) as nodes, relationships(p1)+relationships(p2) as rels;
This however gives me way too many nodes that are not even involved and it takes forever to process this query. I think this is because I'm not sure if the same (b) node is used in the 2 allShortestPaths functions.
This would be the result more or less:
/-(v2)
/-(v1)
(a)-(x1)-(b)-(x2)-(c)
\-(y1) \-(z1)-(z2)
The ideal solution would be something like this:
(a)-(x1)-(b1)-(x2)-(c)
\-(b2)-(y1)-(y2)-(c)
This means that there are 2 shortest paths found between (a) and (c) that go through a node (b) with label 'SomeImportantLabel'.

You can use the ANY/ALL/SINGLE/NONE functions to filter path results in the WHERE part, and Neo4j can apply those filters (at least for ALL/NONE if needed) while searching the path.
So for example...
MATCH p = allShortestPaths((a)-[*]-(c))
WHERE ID(a) = 123 AND ID(c) = 456
AND ANY(b in NODES(p) WHERE a<>b<>c AND b:SomeImportantLabel)
RETURN nodes(p) as nodes, relationships(p) as rels;
Also, while we could truncate the head/tail of the list from the filter set of ANY, the Cypher planner likes for the same filter to apply to the whole path, so it's better to exclude them in the WHERE part.

Related

Cypher match path with intermediate nodes

I have the following graph with Stop (red) and Connection (green) nodes.
I want to find the shortest path from A to C using a cost property on Connection.
I would like to avoid making Connection a relationship because than I loose the CONTAINS relationship of Foo.
I can match a single hop like this
MATCH p=(:Stop {name:'A'})<-[:BEGINS_AT]-(:Connection)-[:ENDS_AT]->(:Stop {name:'B'}) RETURN p
but this does not work with an arbitrary number of Connections like it would with relationships and [*].
I also tried to make a projection down to simple relationships but it seems like I cannot do something with this without GDS.
MATCH (s1:Stop)<-[:BEGINS_AT]-(c:Connection)-[:ENDS_AT]->(s2:Stop) RETURN id(s1) AS source, id(s2) AS target, c.cost AS cost
Note that the connection is unidirectional, so it must not be possible to go from C to A.
Is there a way to do this without any Neo4j plugins?
This should get all usable paths (without plugins):
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=(a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop)
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p
To get the shortest path:
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=(a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop)
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p
ORDER BY LENGTH(p)
LIMIT 1;
or
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=shortestpath((a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop))
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p
If you want to calculate the weighted shortest path, then it is the easiest to use GDS or even APOC plugin. You could probably create a shortest weighted path function with cypher but it would be not optimized. I can only think of finding all paths between the two nodes and suming the weights. In the next step you would filter the path with the minimum sum of weight. This would not scale well though.
As for the second part of your question I would need more information as I dont know exactly what you want.

How Many Nodes Are Involved in a Match

How can I know how many nodes and edges are involved in a MATCH? Is there another way besides Explain / Profile Match?
If you mean how many nodes are matched in a path, such as a variable-length path, then you can assign a path variable for this:
MATCH p = (k:Person {name:'Keanu Reeves'})-[*..8]-(t:Person {name:'Tom Hanks'})
WITH p LIMIT 1
RETURN p, length(p) as pathLength, length(p) + 1 as numberOfNodesInPath
You can also use nodes(p) and relationships(p) to get the collection of nodes and relationships that make up the path, and you can use size() on those collections to get their size.
There exists the COUNT() function of Cypher that allows you to count the number of elements. As for example in this query:
MATCH (n)
RETURN COUNT(n);
This query will count all nodes in your database.
You can find more information in the cypher manual, under the aggregating functions. Check it out.
The following Cypher snippet should return the number of distinct nodes and relationships found by any given MATCH clause. Just replace <your code here> with your MATCH pattern.
MATCH <your code here>
WITH COLLECT(NODES(p)) AS ns, SUM(SIZE(RELATIONSHIPS(p))) AS relCount
UNWIND ns AS nodeList
UNWIND nodeList AS node
RETURN COUNT(DISTINCT node) AS nodeCount, relCount;

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

Background
Hi all, I am currently trying to write a cypher statement that allows me to find a set of paths on a map from a starting point. I want my search result to always return connecting streets within 5 nodes. Optionally, if there's a nearby hospital, I would like my search pattern to also indicate nearby hospitals.
Main Problem
Because there isn't always a nearby hospital to the current street, sometimes my optional match search pattern comes back as null. Here's the current cypher statement I'm using:
MATCH path=(a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WHERE ALL (x IN nodes(path) WHERE (x:Street))
WITH DISTINCT nodes(path) + nodes(optionalPath) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
However, this syntax only works if optionalPath contains nodes. If it doesn't, the statement nodes(path) + nodes(optionalPath) is an operation adding null and I get no records. This is true even the nodes(path) term does contain nodes.
What's the best way to get around this problem?
You can use COALESCE to replace a NULL with some other value. For example:
MATCH path=(:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
WHERE ALL (x IN nodes(path) WHERE x:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodes(path) + COALESCE(nodes(optionalPath), []) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
I have also made a few other improvements:
The WHERE clause was moved up right after the first MATCH. This eliminates the unwanted path values immediately. Your original query would get all path values (even unwanted ones) and always the perform the second MATCH query, and only eliminate unwanted paths afterwards. (But, it is actually not clear if you even need the WHERE clause at all; for example, if the CONNECTED_TO relationship is only used between Street nodes.)
The DISTINCT in your WITH clause would have prevented duplicate n collections, but the collections internally could have had duplicate paths. This was probably not what you wanted.
It seems you don't really want the path, just all the street nodes within 5 steps, plus any connected hospitals. So I would simplify your query to just that, and then condense the 3 columns down to 1.
MATCH (a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH collect(a) + collect(b) + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
If Streets can be indirectly connected (hospital in between), Than I'd adjust like this
MATCH (a:Street {id: 123})-[:CONNECTED_TO]-(b:Street)
WITH a as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodez + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
It's a bit more verbose, but just says exactly what you want (and also adds the start node to the hospital check list)

neo4j -- Find all shortest paths between more than 2 nodes

For example,I want to query allShortestPaths between 3 nodes(A,B,C),it means i want to query:
1. the allShortestPaths between A and B
2. the allShortestPaths between C and B
3. the allShortestPaths between A and C
but I only find the allShortestPaths query to get allShortestPaths between two nodes.
As follow:
MATCH (node1:E { eid:"a9c2f114-796f-4934-a2d0-04bb3345e1d2" }),
(node2:E { eid:"01968dd2-1ed6-472d-82e9-be7701036b3b" }),
p = allShortestPaths((node1)-[*]-(node2))
RETURN p LIMIT 25
I am wondering if there exists a allShortestPaths query supporting more than 2 nodes input?
Now,to search 3 nodes,I have to invoke the "allShortestPaths" three times,as follow:
MATCH (node1:E { eid:"b73ade90-dfa1-4b94-bd0f-c16fd93bd680" }),
(node2:E { eid:"ddb5c52d-7002-4ac7-87d5-0f727f2ab3e7" }),
(node3:E { eid:"0398b081-6676-4a91-856b-abbabaee5e70" }) ,
p = allShortestPaths((node1)-[*]-(node2)),
q = allShortestPaths((node3)-[*]-(node2)),
m = allShortestPaths((node3)-[*]-(node1))
RETURN p,q,m LIMIT 10
What i want to do is to search allShortestPaths between arbitrary number of nodes.
So far,I intend to write user-defined procedures,but it will costs more time.I wondering who can provide better advice.
i want to search search allShortestPaths between serveral nodes.
such as: allShortestPaths((a)-[*]-(b)-[*]-(c)-[*]-(a))
I want get the all shortest path between a and b,b and c,c and a in a query
You need a nested loops:
// Array of id
WITH ["b73ade90-dfa1-4b94-bd0f-c16fd93bd680",
"ddb5c52d-7002-4ac7-87d5-0f727f2ab3e7",
"0398b081-6676-4a91-856b-abbabaee5e70"] as IDS
UNWIND IDS as vid
// Looking for the desired nodes
MATCH (N:E {id: vid})
WITH collect(N) as NS
// Nested loops
UNWIND RANGE(0, size(NS)-2) as i1
UNWIND RANGE(i1+1, size(NS)-1) as i2
WITH NS[i1] as N1,
NS[i2] as N2
// Get paths
MATCH ps = allShortestPaths((N1)-[*]-(N2))
RETURN ps
Neo4j doesn't provide a version of allShortestPaths taking multiple patterns, which is what you want:
allShortestPaths((node1)-[*]-(node2), (node1)-[*]-(node3), (node2)-[*]-(node3))
You wish to optimize the traversals by piggy-backing on the first one to do the second one at the same time, but there's no such thing out of the box, and it wouldn't do the third one either. It's a really specific use case.
You either have to call allShortestPaths n*(n-1) times (for n nodes) in Cypher, or try implementing it yourself server-side in a procedure using the Traversal framework.
here a sample cypher
MATCH (n:Entity) where n.name IN {names}
WITH collect(n) as nodes
UNWIND nodes as n
UNWIND nodes as m
WITH * WHERE id(n) < id(m)
MATCH path = allShortestPaths( (n)-[*..4]-(m) )
RETURN path
see https://neo4j.com/developer/kb/all-shortest-paths-between-set-of-nodes/ for more

neo4j how to use count(distinct()) over the nodes of path

I search the longest path of my graph and I want to count the number of distinct nodes of this longest path.
I want to use count(distinct())
I tried two queries.
First is
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return nodes(p1)
The query result is a graph with the path nodes.
But if I tried the query
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return count(distinct(primero))
The result is
count(distinct(primero))
2
How can I use count(distinct()) over the node primero.
Node Primero has a field called id.
You should bind at least one of those nodes, add a direction and also consider a path-limit otherwise this is an extremely expensive query.
match p=(primero)-[:ResponseTo*..30]-(segundo)
with p order by length(p) desc limit 1
unwind nodes(p) as n
return distinct n;

Resources