Neo4J finding two nodes such that the shortest path between them is of length n - neo4j

I would like to know if there is a way to find two nodes such that the shortest path between them is of a specific length, say, 10.
All my nodes have the same label; "n1", and the shortest path can be through any edge type.
So far i have been doing this manually, by finding the shortest path between node n and node m, and constantly changing n and m and stop when i find a path of length 10.
Here is the Cypher query:
match sp = shortestpath((startNode)-[*]->(endNode)) where id(startNode) = 1 and id(endNode) = 2 return sp
Note, i do not specify the node label since i only have one label in the graph.
So i just continuously change the start and end nodes and run it until i find a path of the desired length.
I'm sure there is an easier way to do this, but since i am a Neo beginner i am struggling to figure it out.
I have also tried this:
MATCH (n1), (n2)
WHERE n1 <> n2 and shortestPath((n1)-[*]-(n2)) = 5
RETURN n1, n2
LIMIT 2
However, i don't believe this is correct because shortest paths of length 5 is very common in my graph, and it is taking a long time to execute...

[UPDATED]
This query should be more performant. It avoids using a cartesian product, places an upper bound on the variable-length relationship pattern, and does not even use shortestpath.
MATCH p=(n1)-[*10]->(n2)
WHERE n1 <> n2 AND NOT (n1)-[*..9]->(n2)
RETURN n1, n2
LIMIT 1

This has worked for me!
MATCH (n1), (n2)
WHERE n1 <> n2 and length(shortestPath((n1)-[*]->(n2))) = 10
RETURN n1, n2
LIMIT 1

Related

How do I calculate average shortest path of a network in Neo4J using cypher

I have 9 nodes in a directed network (all have at least 1 connection) so a total of 72 shortest paths. I want to find the average of the 72 shortest paths.
Here is the code I used to find all shortest paths between a set of nodes (modified from https://community.neo4j.com/t/all-shortest-paths-between-a-set-of-nodes/241)
MATCH (p:Person)
WITH collect(p) as nodes
UNWIND nodes as n
UNWIND nodes as m
WITH * WHERE id(n)<id(m)
MATCH path = allShortestPaths((n)-[:KNOWS*]-(m))
RETURN length(path)
The result at first looks to be correct. It has paths of 1, 2, 3, 4, 5, and 6. However, I noticed there are 408 results when there should only be 72.
Would appreciate any insight into where I went wrong.
The allShortestPaths function returns all shortest paths, so it can return multiple paths if they all have the same (shortest) length. To get just 1 shortest path, you should use the shortestPath function instead.
The number of unique pairs of nodes out of 9 nodes is not 9*8 (or 72). It is half of that, or 36.
This query should return 36 results:
MATCH (p:Person)
WITH collect(p) as nodes
UNWIND nodes as n
UNWIND nodes as m
WITH n, m WHERE id(n)<id(m)
MATCH path = shortestPath((n)-[:KNOWS*]-(m))
RETURN length(path)
To get the average length, just use this RETURN clause instead:
RETURN AVG(length(path))

How could I optimize this cypher query?

When I used this cypher query
match p=(n)-[r*8]-(n)
where id(n)=548
with p
where ALL(x IN nodes(p)[1..length(p)] WHERE SINGLE(y IN nodes(p)[1..length(p)] WHERE x=y))
return count(p)
it took 51922 ms to return the result; it is really a long time. How could I optimize this cypher query? Any help would be appreciated.
Looks like you want a simple circuit with no repeating nodes (except the start and end node).
There's an APOC Procedure to get all simple paths between two nodes, with a maximum path length. It doesn't currently work when the start and end nodes are the same, but if we set the end node as any adjacent node to your start node, and filter to only keep paths of length 7 (since the paths exclude the last hop back to the start node), then we should be able to get the right answer extremely fast.
match (n)--[m]
with distinct n, m
call apoc.algo.allSimplePaths(n, m, '', 7) YIELD path
with path
where length(path) = 7
return count(path)

Optimizing Cypher Query Neo4j

I want to write a query in Cypher and run it on Neo4j.
The query is:
Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.
(start)-[*]->(v)
for every edge E walked
if startVertex(E).someproperty != endVertex(E).someproperty, output E.
The graph may contain cycles.
For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.
If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)
But I can't express this algorithm in Cypher since it's very different language.
Then i wrote this query:
find all reachable vertexes
(start)-[*]->(v), reachable = start + v.
find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.
match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop
so the Cypher code looks like this:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
The performance of this query is not good as I thought. There are index on :Col(schema)
I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.
My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.
The following code block is the logic I want:
VertexQueue1 = (starting vertexes);
VisitedVertexSet = (empty);
EdgeSet1 = (empty);
While (VertexSet1 is not empty)
{
Vertex0 = VertexQueue1.pop();
VisitedVertexSet.add(Vertex0);
foreach (Edge0 starting from Vertex0)
{
Vertex1 = endingVertex(Edge0);
if (Vertex1.schema <> Vertex0.schema)
{
EdgeSet1.put(Edge0);
}
if (VisitedVertexSet.notContains(Vertex1)
and VertexQueue1.notContains(Vertex1))
{
VertexQueue1.push(Vertex1);
}
}
}
return EdgeSet1;
EDIT:
The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.
LEFT one:
match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
where prev.schema<>node.schema
return distinct prev,node
RIGHT one:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
I think Cypher lets this be much easier than you're expecting it to be, if I'm understanding the query. Try this:
MATCH (start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WHERE start.schema <> node.schema
RETURN start, node
Though I'm not sure why you're comparing the schema property on the nodes. Isn't the schema for the start node fixed by the value that you pass in?
I might not be understanding the query though. If you're looking for more than just the nodes connected to the start node, you could do:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN prev, node
That open-ended variable length relationship specification might be slow, though.
Also note that when Cypher is browsing a particular path it stops which it finds that it's looped back onto some node (EDIT relationship, not node) in the path matched so far, so cycles aren't really a problem.
Also, is the DWMDATA value that you're passing in interpolated? If so, you should think about using parameters for security / performance:
http://neo4j.com/docs/stable/cypher-parameters.html
EDIT:
Based on your comment I have a couple of thoughts. First limiting to DISTINCT path isn't going to help because every path that it finds is distinct. What you want is the distinct set of pairs, I think, which I think could be achieved by just adding DISTINCT to the query:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN DISTINT prev, node
Here is another way to go about it which may or may not be more efficient, but might at least give you an idea for how to go about things differently:
MATCH
path=(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WITH rels(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel
WITH startNode(rel) AS start_node, endNode(rel) AS end_node
WHERE start_node.schema <> end_node.schema
RETURN start_node, end_node
I can't say that this would be faster, but here's another way to try:
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH collect(ID(node)) AS node_ids
MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node
I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.
// this is very inefficient!
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH distinct node
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
you might be better off with this:
MATCH (start:Col)
WHERE start.property IN {property_values}
MATCH (node:Col)
WHERE shortestPath((start)-[*]->(node)) IS NOT NULL
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;

Neo4j cyper query: How to get unique nodes for two depth with their depth value

I am using Cyper query in neo4j
My requirement is,
need to get two level unique(friends) and their shortest depth value.
Graph looks like,
a-[:frnd]->b, b-[:frnd]->a
b-[:frnd]->c, c-[:frnd]->b
c-[:frnd]->d, d-[:frnd]->c
a-[:frnd]->c, c-[:frnd]->a
I tried as,
START n=node(8) match p=n-[:frnd*1..2]->(x) return x.email, length(p)
My output is,
b 1 <--length(p)
a 2
c 2
c 1
d 2
a 2 and so on.
My required output,
My parent node(a) should not not be listed.
I need only (c) with shortest length 1
c with 2 should not be repeated.
Pls help me to solve this,.
(EDITED. Finding n via START n=node(8) causes problems with other variables later on. So, below we find n in the MATCH statement.)
MATCH p = shortestPath((n {email:"a"})-[:frnd*..2]->(x))
WHERE n <> x AND length(p) > 0
RETURN x.email, length(p)
ORDER BY length(p)
LIMIT 1
If there are multiple "closest friends", this returns one of them.
Also, the shortestPath() function does not support a minimal path length -- so "1..2" had be become "..2", and the WHERE clause needed to specify length(p) > 0.

Cypher query to get shortest path between A and B that doesn't go through C, that isn't massively slow? Or recommend alternative to Cypher/Neo4j

I'm working with a graph that has thousands of nodes. Say I have person nodes, and FRIENDS relationships between them. e.g., gus-[:FRIENDS]-skylar
If I wanted to find the shortest friend path between hank and gus as long as they're not separated by more than 20 rels, I could do this:
START hank=node(68), gus=node(66)
MATCH p = shortestPath((hank)-[:FRIENDS*..20]-(gus))
RETURN p
This works and is fast, even when the found shortest path is of length 10 or more.
But say I wanted to find a path from hank to gus that does not go through glenn?
The query I've tried is this:
START hank=node(68), gus=node(66), glenn=node(59)
MATCH p =(hank)-[:FRIENDS*..20]-(gus)
WHERE NOT glenn IN nodes(p)
RETURN p
ORDER BY length(p)
LIMIT 1;
This works on very small graphs (30 or so people), but if there are 1000's...the JVM runs out of heapspace.
So I'm guessing Cypher finds ALL paths between gus and hank of length 20 or less, and then applies the WHERE filter? It's clear why that would be slow.
In an abstract sense, this algorithm should be doable with the same big O runtime, because all that would change is that you check to make sure each node (as you search) isn't the one you want to avoid.
Any suggestions for how to accomplish this? I'm pretty new to Cypher.
If this is not possible with Cypher, can you recommend some other database and graph language "stack"?
Thanks
Can you test the performance of the following query? The main difference is that it compares paths instead of nodes. I've added a direction in the paths as well, as that will speed up the query.
START hank=node(68), gus=node(66), glenn=node(59)
MATCH p = allshortestPaths((hank)-[:FRIENDS]->(gus))
WITH COLLECT(p) AS gusPaths, hank, glenn
MATCH p2 = allshortestPaths((hank)-[:FRIENDS]->(glenn))
WITH COLLECT(p2) AS glennPaths, gusPaths
WITH filter(x IN gusPaths
WHERE NONE (x2 IN glennPaths
WHERE x = x2)) AS filtered
RETURN filtered
ORDER BY length(filtered)
LIMIT 1

Resources