Cypher query to find the hop depth length of particular relationships - neo4j

I am trying to find the amount of relationships that stem originally from a parent node and I am not sure the syntax to use in order to gain access to this returned integer. I am can be sure in my code that each child node can only have one relationship of a particular type so this allows me to capture a "true" depth reading
My attempt is this but I am hoping there is a cleaner way:
MATCH p=(n {id:'123'})-[r:Foo*]->(c)
RETURN length(p)
I am not sure this is the correct syntax because it returns an array of integers with the last index being the true tally length. I am hoping for something that just returns an int instead of this mentioned array.
I am very grateful for help that you may be able to offer.

As Nicole says, in general, finding the longest path between two nodes in a graph is not feasible in any reasonable time. If your graph is very small, it is possible that you will be able to find all paths, and select the one with the most edges but this won't scale to larger graphs.
However there is a trick that you can do in certain circumstances. If your graph contains no directed cycles, you can assign each edge a weight of -1, and then look for the shortest weighted path between the source and target nodes. Since the edge weights are negative a shortest weighted path must correspond to a path with a maximum number of edges between the desired nodes.
Unfortunately, Cypher doesn't yet support shortest weighted path algorithms, however the Neo4j database engine does. The docs give an example of how to do this. You will also need to implement your own algorithm, such as Bellman-Ford using the traversal API, because Dijkstra won't work with -ve edge weights.
However, please be aware that this trick won't work if your graph contains cycles - it must be a DAG.

Your query:
MATCH p=(n {id:'123'})-[r:Foo*]->(c)
RETURN length(p)
is returning the length of ALL possible paths from n to c. You probably are only interested in the shortest path? You can use the shortestPath function to only consider the shortest path from n to c:
MATCH p = shortestPath((n {id:'123'})-[r:Foo*]->(c))
RETURN length(p)

Related

Neo4j Find alternative shortest paths between nodes

How do I find multiple distinct short paths between 2 nodes in a graph with 7.5M nodes and 20M relationships?
We want a feature similar to how google maps shows other alternative routes.
problems with:
Dijkstra, shortestPath and allShortestPaths:
Only returns the shortest path or paths with the shortest length.
Yen's k shortest paths:
Absurdly slow on a big graph
Iterate over list of numbers 0-10 and call allShortestPaths with minimum number of length of i:
Absurdly slow on a big graph
After some time, I figured that filtering/blacklisting for certain relationships that already has been used for another path could work. However, when I tried this in practice i noticed that filtering after running allShortestPaths wasn't a viable solution. After this I figured doing allShortestPaths mulitple times and when a path is found I could rename the type to currentTypeName_TEMP, then rename it after again.
To find multiple distinct paths, this could be done with an algorithm shown on the picture, and given the effectiveness of allShortestPaths, this will not be that computational or time intensive (dotted lines means found relationships).
EDIT:
turns out this is rather difficult in cypher, since it is very restricive in nature, we ended up writing a plugin, where we made a hack of dijkstra where it takes N of the shortest paths.

Cypher: Find any path between nodes

I have a neo4j graph that looks like this:
Nodes:
Blue Nodes: Account
Red Nodes: PhoneNumber
Green Nodes: Email
Graph design:
(:PhoneNumber) -[:PART_OF]->(:Account)
(:Email) -[:PART_OF]->(:Account)
The problem I am trying to solve is to
Find any path that exists between Account1 and Account2.
This is what I have tried so far with no success:
MATCH p=shortestPath((a1:Account {accId:'1234'})-[]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[:PART_OF]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[*]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=(a1:Account {accId:'1234'})<-[:PART_OF*1..100]-(n)-[:PART_OF]->(a2:Account {accId:'5678'}) RETURN p;
Same queries as above without the shortest path function call.
By looking at the graph I can see there is a path between these 2 nodes but none of my queries yield any result. I am sure this is a very simple query but being new to Cypher, I am having a hard time figuring out the right solution. Any help is appreciated.
Thanks.
All those queries are along the right lines, but need some tweaking to make work. In the longer term, though, to get a better system to easily search for connections between accounts, you'll probably want to refactor your graph.
Solution for Now: Making Your Query Work
The path between any two (n:Account) nodes in your graph is going to look something like this:
(a1:Account)<-[:PART_OF]-(:Email)-[:PART_OF]->(ai:Account)<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->(a2:Account)
Since you have only one type of relationship in your graph, the two nodes will thus be connected by an indeterminate number of patterns like the following:
<-[:PART_OF]-(:Email)-[:PART_OF]->
or
<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->
So, your two nodes will be connected through an indeterminate number of intermediate (:Account), (:Email), or (:PhoneNumber) nodes all connected by -[:PART_OF]- relationships of alternating direction. Unfortunately to my knowledge (and I'd love to be corrected here), using straight cypher you can't search for a repeated pattern like this in your current graph. So, you'll simply have to use an undirected search, to find nodes (a1:Account) and(a2:Account) connected through -[:PART_OF]- relationships. So, at first glance your query would look like this:
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*]-(a2:Account { accId: {a2_id} }))
RETURN *
(notice here I've used cypher parameters rather than the integers you put in the original post)
That's very similar to your query #3, but, like you said - it doesn't work. I'm guessing what happens is that it doesn't return a result, or returns an out of memory exception? The problem is that since your graph has circular paths in it, and that query will match a path of any length, the matching algorithm will literally go around in circles until it runs out of memory. So, you want to set a limit, like you have in query #4, but without the directions (which is why that query doesn't work).
So, let's set a limit. Your limit of 100 relationships is a little on the large side, especially in a cyclical graph (i.e., one with circles), and could potentially match in the region of 2^100 paths.
As a (very arbitrary) rule of thumb, any query with a potential undirected and unlabelled path length of more than 5 or 6 may begin to cause problems unless you're very careful with your graph design. In your example, it looks like these two nodes are connected via a path length of 8. We also know that for any two nodes, the given minimum path length will be two (i.e., two -[:PART_OF]- relationships, one into and one out of a node labelled either :Email or :PhoneNumber), and that any two accounts, if linked, will be linked via an even number of relationships.
So, ideally we'd set out our relationship length between 2 and 10. However, cypher's shortestPath() function only supports paths with a minimum length of either 0 or 1, so I've set it between 1 and 10 in the example below (even though we know that in reality, the shortest path have a length of at least two).
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*1..10]-(a2:Account { accId: {a2_id} }))
RETURN *
Hopefully, this will work with your use case, but remember, it may still be very memory intensive to run on a large graph.
Longer Term Solution: Refactor Graph and/or Use APOC
Depending on your use case, a better or longer term solution would be to refactor your graph to be more specific about relationships to speed up query times when you want to find accounts linked only by email or phone number - i.e. -[:ACCOUNT_HAS_EMAIL]- and -[:ACCOUNT_HAS_PHONE]-. You may then also want to use APOC's shortest path algorithms or path finder functions, which will most likely return a faster result than using cypher, and allow you to be more specific about relationship types as your graph expands to take in more data.

Is neo4j suitable for searching for paths of specific length

I am a total newcommer in the world of graph databases. But let's put that on a side.
I have a task to find a cicular path of certain length (or of any other measure) from start point and back.
So for example, I need to find a path from one node and back which is 10 "nodes" long and at the same time has around 15 weights of some kind. This is just an example.
Is this somehow possible with neo4j, or is it even the right thing to use?
Hope I clarified it enough, and thank you for your answers.
Regards
Neo4j is a good choice for cycle detection.
If you need to find one path from n to n of length 10, you could try some query like this one:
MATCH p=(n:TestLabel {uuid: 1})-[rels:TEST_REL_TYPE*10]-(n)
RETURN p LIMIT 1
The match clause here is asking Cypher to find all paths from n to itself, of exactly 10 hops, using a specific relationship type. This is called variable length relationships in Neo4j. I'm using limit 1 to return only one path.
Resulting path can be visualized as a graph:
You can also specify a range of length, such as [*8..10] (from 8 to 10 hops away).
I'm not sure I understand what you mean with:
has around 15 weights of some kind
You can check relationships properties, such as weight, in variable length paths if you need to. Specific example in the doc here.
Maybe you will also be interested in shortestPath() and allShortestPaths() functions, for which you need to know the end node as well as the start one, and you can find paths between them, even specifying the length.
Since you did not provide a data model, I will just assume that your starting/ending nodes all have the Foo label, that the relevant relationships all have the BAR type, and that your circular path relationships all point in the same direction (which should be faster to process, in general). Also, I gather that you only want circular paths of a specific length (10). Finally, I am guessing that you prefer circular paths with lower total weight, and that you want to ignore paths whose total weight exceed a bounding value (15). This query accomplishes the above, returning the matching paths and their path weights, in ascending order:
MATCH p=(f:Foo)-[rels:BAR*10]->(f)
WITH p, REDUCE(s = 0, r IN rels | s + r.weight) AS pathWeight
WHERE pathWeight <= 15
RETURN p, pathWeight
ORDER BY pathWeight;

Find Longest Path in Graph

I have been trying hard to find out longest path in a complex network. I have been through many questions in StackOverflow and Internet, but none could help me. I have written a CQL as
start n=node(*)
match p = (n)-[:LinkTo*1..]->(m)
with n,MAX(length(p)) as L
match p = (n)-[:LinkTo*1..]->(m)
where length(p) = L
return p,L
I don't get any solution. Neo4J would keep running for the answer, and I also tried executing it in Neo4J Cloud Hosting. I didn't any solution even there, but got an error
"Error undefined-undefined"
I am in dire need of a solution. The result for this answer will help me complete my project. So, anyone please help me in correcting the query.
Well for one you're doing a highly expensive operation twice when you only have to do it once.
Additionally, you are returning one path per every single node in your database, at least (as there may be multiple paths for a node that are the longest paths available for that node). But from your question it sounds like you want the single largest path in the graph, not one each for every single node.
We can also improve your match by only performing the longest-path match on nodes that are at the head of the path, and not somewhere in the middle.
Maybe try this one?
match (n)
where (n)-[:LinkTo]->() and not ()-[:LinkTo]->(n)
match p = (n)-[:LinkTo*1..]->(m)
return p, length(p) as L
order by L desc
limit 1
The problem you're trying to solve is NP-hard. On small sparse graphs a brute-force approach such as the one suggested by InverseFalcon may succeed in reasonable time, but on any reasonably large and/or densely connected graph, you will quickly run into both time and space problems.
If you have a weighted graph, you can find the longest path between 2 nodes by negating all the edge-weights, and running a shortest weighted path algorithm over the modified graph. However if you want to find the longest path in the entire graph, you are effectively trying to solve the Travelling Salesman Problem, but with -ve edge weights. You can't do that with Cypher.
If your graph is unweighted, I'd find an easier problem, or see if you can convert your graph to a weighted one and tackle it as described above. Alternatively, see if you can frame your requirements in such a way that you don't need to find a longest path.

How to implement Dijkstra's algorithm in Neo4j using Cypher

My question is: is it possible to implement Dijkstra's algorithm using Cypher? the explanation on the neo4j website only talks about REST API and it is very difficult to understand for a beginner like me
Please note that I want to find the shortest path with the shortest distance between two nodes, and not the shortest path (involving least number of relationships) between two nodes. I am aware of the shortestPath algorithm that is very easy to implement using Cypher, but it does not serve my purpose.
Kindly guide me on how to proceed if I have a graph database with nodes, and relationships between the nodes having the property 'distance'. All I want is to write a code with the help of which we will be able to find out the shortest distance between two nodes in the database. Or any tips if I need to change my approach and use some other program for this?
In this case you can implement the allShortestPaths, ordering the paths in an ascending order based on your distance property of the relationships and return only one, based on your last post it would be something like this :
MATCH (from: Location {LocationName:"x"}), (to: Location {LocationName:"y"}) ,
paths = allShortestPaths((from)-[:CONNECTED_TO*]->(to))
WITH REDUCE(dist = 0, rel in rels(paths) | dist + rel.distance) AS distance, paths
RETURN paths, distance
ORDER BY distance
LIMIT 1
No, it's not possible in a reasonable way unless you use transactions and basically rewrite the algorhythm.
The previous answer is wrong as longer but less expensive paths will not be returned by the allShortestPaths subset. You will be filtering a subset of paths that have been chosen without considering relationship cost.

Resources