Neo4j Cypher find all paths exploring sorted relationships - neo4j

I'm struggling for days to find a way for finding all paths (to a maximum length) between two nodes while controlling the path exploration by Neo4j by sorting the relationships that are going to be explored (by one of their properties).
So to be clear, lets say I want to find K best paths between two nodes until a maximum length M. The query will be like:
match (source{name:"source"}), (target{name:"target"}),
p = (source)-[*..M]->(target)
return p order by length(p) limit K;
So far so good. But lets say the relationships of the path have a property called "priority". What I want is to write a query that tells Neo4j on each step of path exploration which relationships should be explored first.
I know that can be possible when I use the java libraries and an embedded database (By implementing PathExpander interface and giving it as input to the GraphAlgoFactory.allSimplePaths() function in Java).
But now I'm trying to find a way doing this in a server mode database access using Bolt or REST api.
Is there any way to do this in the server mode? Or maybe using Java libraries functions while accessing the graph in server mode?

use labels and an index to find your two start-nodes
perhaps consider allShortestPaths to make it faster
try this:
match (source{name:"source"}), (target{name:"target"}),
p = (source)-[rels:*..20]->(target)
return p, reduce(prio=0, r IN rels | prio + r.priority) as priority
order by priority ASC, length(p)
limit 100;

I had a very similar problem. I was trying to find the shortest path from one node to all other nodes. I had written a query similar to the one in the answer above (https://stackoverflow.com/a/38030536/783836) and couldn't get it to perform in any reasonable time.
Asking Can Graph DBs perform well with unspecified end nodes? pointed me to the solution: the Single Shortest Path algorithm.
In Neo4j you need to install the Graph Data Science Library and make use of this function: gds.alpha.shortestPath.deltaStepping.stream

Related

Cypher: Find any path between nodes

I have a neo4j graph that looks like this:
Nodes:
Blue Nodes: Account
Red Nodes: PhoneNumber
Green Nodes: Email
Graph design:
(:PhoneNumber) -[:PART_OF]->(:Account)
(:Email) -[:PART_OF]->(:Account)
The problem I am trying to solve is to
Find any path that exists between Account1 and Account2.
This is what I have tried so far with no success:
MATCH p=shortestPath((a1:Account {accId:'1234'})-[]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[:PART_OF]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[*]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=(a1:Account {accId:'1234'})<-[:PART_OF*1..100]-(n)-[:PART_OF]->(a2:Account {accId:'5678'}) RETURN p;
Same queries as above without the shortest path function call.
By looking at the graph I can see there is a path between these 2 nodes but none of my queries yield any result. I am sure this is a very simple query but being new to Cypher, I am having a hard time figuring out the right solution. Any help is appreciated.
Thanks.
All those queries are along the right lines, but need some tweaking to make work. In the longer term, though, to get a better system to easily search for connections between accounts, you'll probably want to refactor your graph.
Solution for Now: Making Your Query Work
The path between any two (n:Account) nodes in your graph is going to look something like this:
(a1:Account)<-[:PART_OF]-(:Email)-[:PART_OF]->(ai:Account)<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->(a2:Account)
Since you have only one type of relationship in your graph, the two nodes will thus be connected by an indeterminate number of patterns like the following:
<-[:PART_OF]-(:Email)-[:PART_OF]->
or
<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->
So, your two nodes will be connected through an indeterminate number of intermediate (:Account), (:Email), or (:PhoneNumber) nodes all connected by -[:PART_OF]- relationships of alternating direction. Unfortunately to my knowledge (and I'd love to be corrected here), using straight cypher you can't search for a repeated pattern like this in your current graph. So, you'll simply have to use an undirected search, to find nodes (a1:Account) and(a2:Account) connected through -[:PART_OF]- relationships. So, at first glance your query would look like this:
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*]-(a2:Account { accId: {a2_id} }))
RETURN *
(notice here I've used cypher parameters rather than the integers you put in the original post)
That's very similar to your query #3, but, like you said - it doesn't work. I'm guessing what happens is that it doesn't return a result, or returns an out of memory exception? The problem is that since your graph has circular paths in it, and that query will match a path of any length, the matching algorithm will literally go around in circles until it runs out of memory. So, you want to set a limit, like you have in query #4, but without the directions (which is why that query doesn't work).
So, let's set a limit. Your limit of 100 relationships is a little on the large side, especially in a cyclical graph (i.e., one with circles), and could potentially match in the region of 2^100 paths.
As a (very arbitrary) rule of thumb, any query with a potential undirected and unlabelled path length of more than 5 or 6 may begin to cause problems unless you're very careful with your graph design. In your example, it looks like these two nodes are connected via a path length of 8. We also know that for any two nodes, the given minimum path length will be two (i.e., two -[:PART_OF]- relationships, one into and one out of a node labelled either :Email or :PhoneNumber), and that any two accounts, if linked, will be linked via an even number of relationships.
So, ideally we'd set out our relationship length between 2 and 10. However, cypher's shortestPath() function only supports paths with a minimum length of either 0 or 1, so I've set it between 1 and 10 in the example below (even though we know that in reality, the shortest path have a length of at least two).
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*1..10]-(a2:Account { accId: {a2_id} }))
RETURN *
Hopefully, this will work with your use case, but remember, it may still be very memory intensive to run on a large graph.
Longer Term Solution: Refactor Graph and/or Use APOC
Depending on your use case, a better or longer term solution would be to refactor your graph to be more specific about relationships to speed up query times when you want to find accounts linked only by email or phone number - i.e. -[:ACCOUNT_HAS_EMAIL]- and -[:ACCOUNT_HAS_PHONE]-. You may then also want to use APOC's shortest path algorithms or path finder functions, which will most likely return a faster result than using cypher, and allow you to be more specific about relationship types as your graph expands to take in more data.

Cypher: Ordering Nodes on Same Level by Property on Relationship

I am new to Neo4j and currently playing with this tree structure:
The numbers in the yellow boxes are a property named order on the relationship CHILD_OF.
My goal was
a) to manage the sorting order of nodes at the same level through this property rather than through directed relationships (like e.g. LEFT, RIGHT or IS_NEXT_SIBLING, etc.).
b) being able to use plain integers instead of complete paths for the order property (i.e. not maintaining sth. like 0001.0001.0002).
I can't however find the right hint on how or if it is possible to recursively query the graph so that it keeps returning the nodes depth-first but for the sorting at each level consider the order property on the relationship.
I expect that if it is possible it might include matching the complete path iterating over it with the collection utilities of Cypher, but I am not even close enough to post some good starting point.
Question
What I'd expect from answers to this question is not necessarily a solution, but a hint on whether this is a bad approach that would perform badly anyways. In terms of Cypher I am interested if there is a practical solution to this.
I have a general idea on how I would tackle it as a Neo4j server plugin with the Java traversal or core api (which doesn't mean that it would perform well, but that's another topic), so this question really targets the design and Cypher aspect.
This might work:
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, extract(r in rels(path) | r.order) as orders
ORDER BY orders
if it complains about sorting arrays then computing a number where each digit (or two digits) are your order and order by that number
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, reduce(a=1, r in rels(path) | a*10+r.order) as orders
ORDER BY orders

Better Way to remove cycles from a path in neo4j graph

I am using neo4j graph database version 2.1.7. Brief Details around data:
2 million nodes with 6 different type of nodes, 5 million relationships with only 5 different type of relationships and mostly connected graph but contains a few isolated subgraphs.
While resolving paths, i get cycles in path. And to restrict that, i used the solution shared in below:
Returning only simple paths in Neo4j Cypher query
Here is the Query, i am using:
MATCH (n:nodeA{key:905728})
MATCH path = n-[:rel1|rel2|rel3|rel4*0..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA)
WHERE ALL(a in nodes(path) where 1=length (filter (m in nodes(path) where m=a)))
and (length(EXTRACT (p in NODES(path)| p.key)) > 1)
and ((exists ((c)-[:rel5]->(b)) and (not exists((b)-[:rel1|rel2|rel3|rel4]->(:nodeA)) OR ANY (x in nodes(path) where (b)-[]->(x))))
OR (not exists ((c)-[:rel5]->()) and (not exists ((c)-[:rel1|rel2|rel3|rel4]->(:nodeA)) OR ANY (x in nodes(path) where (c)-[]->(x)))))
RETURN distinct EXTRACT (rp in Rels(path)| type(rp)), EXTRACT (p in NODES(path)| p.key);
The above query solves mine requirement but is not cost effective and keeps running if is run for huge subgraph. I have used 'Profile' command to improve query performance from what i started with. But, now stuck at this point. The performance has improved but, not what i expected from neo4j :(
I don't know that I have a solution, but I have a number of suggestions. Some might speed things up, some might just make the query easier to read.
Firstly, rather than putting exists ((c)-[:rel5]->(b)) in your WHERE, I believe you can put it in your MATCH like this:
MATCH path = n-[:rel1|rel2|rel3|rel4*0..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA), (c)-[:rel5]->(b)
I don't think you need the exists keyword. I think you can just say, for example, (NOT (b)-[:rel1|rel2|rel3|rel4]->(:nodeA))
I'd also suggest thinking about the WITH clause for potential performance improvements.
A couple of notes about your variable paths: In *0.. the 0 means that your potentially looking for a self-reference. That may or may not be what you want. Also, leaving the variable path open ended can often cause performance problems (as I think you're seeing). If you can possibly cap it that may help.
Also, if you upgrade to 2.2.1, there are a number of built-in performance improvements with the 2.2.x line, but you also get visual PROFILEing in the console and a new EXPLAIN command which both profiles and tells you the real performance of the query after running it.
One thing to consider too is that I don't think you're hitting performance boundaries of Neo4j but rather, perhaps, you're potentially hitting some boundaries of Cypher. If so, I might suggest you do your querying with the Java APIs that Neo4j provides for better performance and more control. This can either be via embedding your database if you're using a JVM-compatible language or by writing an unmanaged extension which lets you do your own querying in java but provide a custom REST API from the server
Did a couple of more tweaks to my query as suggested above by Brian. And found improvement in query response time. Now, It takes almost 20% of time in execution compared to my original query and the current query makes almost 60% less db hits, compared to the query i shared earlier, during query execution. PFB the updated query:
MATCH (n:nodeA{key:905728})
MATCH path = n-[:rel1|rel2|rel3|rel4*1..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA)
WHERE ALL(a in nodes(path) where 1=length (filter (m in nodes(path) where m=a)))
and (length(path) > 0)
and ((exists ((c)-[:rel5]->(b)) and (not ((c)-[:rel1|rel2|rel3|rel4]->()) OR ANY (x in nodes(path) where (c)-[]->(x))))
OR (not exists ((c)-[:rel5]->()) and (not ((c)-[:rel1|rel2|rel3|rel4]->()) OR ANY (x in nodes(path) where (c)-[]->(x)))))
RETURN distinct EXTRACT (rp in Rels(path)| type(rp)), EXTRACT (p in NODES(path)| p.key);
And observed dramatic improvement when capped the path from *1.. to *1..15. Also, removed one filter from query which too was taking longer time.
But, the query response time increased when queried on nodes having relationships more than 18-20 depths.
I would advise to use profile command oftenly to find pain points in your query. That would help you resolve the issues faster.
Thanks Brian.

How to implement Dijkstra's algorithm in Neo4j using Cypher

My question is: is it possible to implement Dijkstra's algorithm using Cypher? the explanation on the neo4j website only talks about REST API and it is very difficult to understand for a beginner like me
Please note that I want to find the shortest path with the shortest distance between two nodes, and not the shortest path (involving least number of relationships) between two nodes. I am aware of the shortestPath algorithm that is very easy to implement using Cypher, but it does not serve my purpose.
Kindly guide me on how to proceed if I have a graph database with nodes, and relationships between the nodes having the property 'distance'. All I want is to write a code with the help of which we will be able to find out the shortest distance between two nodes in the database. Or any tips if I need to change my approach and use some other program for this?
In this case you can implement the allShortestPaths, ordering the paths in an ascending order based on your distance property of the relationships and return only one, based on your last post it would be something like this :
MATCH (from: Location {LocationName:"x"}), (to: Location {LocationName:"y"}) ,
paths = allShortestPaths((from)-[:CONNECTED_TO*]->(to))
WITH REDUCE(dist = 0, rel in rels(paths) | dist + rel.distance) AS distance, paths
RETURN paths, distance
ORDER BY distance
LIMIT 1
No, it's not possible in a reasonable way unless you use transactions and basically rewrite the algorhythm.
The previous answer is wrong as longer but less expensive paths will not be returned by the allShortestPaths subset. You will be filtering a subset of paths that have been chosen without considering relationship cost.

Cypher / Efficiency about relationship cardinality

Using Neo4j 2.X and Cypher, I want to query all Users that I know directly or via a friend.
I would expect something like this:
MATCH (me:User("123"))-[:KNOWS*1..2]-(friend) //does not work of course
I think about the shortestPath function, but wouldn't it be too expensive?
Moreover, if I have this query:
MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123")) // would load the whole in memory before filtering by knowledge !
WITH shortestPath((me)-[:KNOWS*..2]-(friend)) as path
WHERE path.length <= 2
OR
MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123")) // would load the whole in memory before filtering by knowledge !
MATCH path = shortestPath((me)-[:KNOWS*..2]-(friend))
WHERE path.length <= 2
Wouldn't it be more (maybe too in the case of a huge graph?) expensive?
Indeed, this would be better, if it worked:
MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123"))-[:KNOWS*1..2]-(friend)
loading in memory only appropriate path.
I could also use an alternative like this:
OPTIONAL MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123"))-[:KNOWS]-(friend)
OPTIONAL MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123"))-[:KNOWS]-()-[:KNOWS]-(friend)
but imagine if I wanted three degrees of separation (for knowledge)... the query would be very redundant.
Is there a good syntax that would lead to a very efficient query?
What should I use?
I'm not sure I completely understand, and I think that your first query would work?
MATCH (me:User{userId:123})-[:KNOWS*1..2]-(friend:User)
WHERE me <> friend
RETURN friend
It's hard to know what to write for the other queries as the OWNS_BY and SOME_REL components seem unrelated to the friend of a friend component, if you could relate the two halves of the query with a concrete example I can explain an optimal approach.
Some key pointers are that you should
Start your queries with what you think will match the minimum set of nodes (to constrain the work that has to be done).
Make sure all query components utilise labels and relationship types.
Create indexes on properties that you will be using in lookups.
An excellent resource for query optimisation is Wes Freeman's Pragmatic Optimisation.
The size of the graph does not need to make the queries more expensive as you will mostly be working on a subgraph which presumably have more fixed sized bounds. Of course if your queries need to span the entire graph then the size will become an issue for speed!

Resources