neo4j query for path - neo4j

I have the following nodes in my graph:
Car
Trash
CarToTrash
[:has_input]-(Car)
[:has_output]-(Trash)
RecycleTrash
[:has_input]-(Trash)
[:has_output]-(Car)
I'm trying to find a query which will give me all shortest paths between the two types, i.e.
(Car)-[has_input]-(CarToTrash)-[has_output]-(Trash)-[has_input]-(RecycleTrash)-[has_output]-(Car)
The length of the path can vary though. It can have more nodes like XToY with an has_input and has_output relation. I'd like to find the shortest path between any two types I might add to the graph. CarToTrash and RecycleTrash represents function and the relation has_input and has_output is the input type and return type of the function. Basically what I have is a graph of types and functions, and I'd like to see check if there is a path of functions between any two arbitrary types in the graph.
I've tried with the following query which works somewhat, but it would find paths which does not follow the pattern has_input, has_output if those existed. Also I tried finding the way from Car back to Car which I was unable to do, I could only find Car to Trash, I might manage without though if it's not possible to query this kind of loops.
MATCH car, trash WHERE car.uid='Car' AND trash.uid='trash'
WITH car, trash MATCH p = allShortestPaths(car-[*..15]-trash) return p;

Since you want structure in your shortest-path path segments, I believe that this algo is not usable for you.
I should probably device your own traversal algo, and use the Java API to do it, based on http://docs.neo4j.org/chunked/stable/tutorial-traversal-java-api.html, which gives you even more flexibility than current Cypher here.

Related

Cypher return multiple hops through pattern of relationships and nodes

I'm making a proof of concept access control system with neo4j at work, and I need some help with Cypher.
The data model is as follows:
(:User|Business)-[:can]->(:Permission)<-[:allows]-(:Business)
Now I want to get a path from a User or a Business to all the Business-nodes that you can reach trough the
-[:can]->(:Permission)<-[:allows]-
pattern. I have managed to write a MATCH that gets me halfway there:
MATCH
path =
(:User {userId: 'e96cca53-475c-4534-9fe1-06671909fa93'})-[:can|allows*]-(b:Business)
but this doesn't have any directions, and I can't figure out how to include the directions without reducing the returned matches to only the direct matches (i.e it doesn't continue after the first hit on a :Business node)
So what I'm wondering is:
Is there a way to match multiple of these hops in one query?
Should I model this entirely different?
Am I on the wrong path completely and the query should be completely
rewritten
Currently the syntax of variable-length expansions doesn't allow fine control for separate directions for certain types. There are improvements in the pipeline around this, but for the moment Cypher alone won't get you what you want.
We can use APOC Procedures for this, as fine control of the direction of types in the expansion, and sequences of relationships, are supported in the path expander procs.
First, though, you'll need to figure out how to address your user-or-business match, either by adding a common label to these nodes by which you can MATCH either type by property, or you can use a subquery with two UNIONed queries, one for :Business nodes, the other for :User nodes, that way you can still take advantage of an index on either, and get possible results in a single variable.
Once you've got that, you can use apoc.path.expandConfig(), passing some options to get what you want:
// assume you've matched to your `start` node already
CALL apoc.path.expandConfig(start, {relationshipFilter:'can>|<allows', labelFilter:'>Business'}) YIELD path
RETURN path
This one doesn't use sequences, but it does restrict the direction of expansion per relationship type. We are also setting the labelFilter such that :Business nodes are the end node of the path and not nodes of any other label.
You can specify the path as follows:
MATCH path = (:User {userId: $id})-[:can]->(:Permission)
<-[:allows]-(:Business))
RETURN path
This should return the results you're after.
I see a good solution has been provided via path expanding APOC procedures.
But I'll focus on your point #2: "Should I model this entirely differently?"
Well, not entirely but I think yes.
The really liberating part of working with Neo4j is that you can change the road you are driving over as easily as you can change your driving strategy: model vs query. And since you are at an early stage in your project, you can experiment with different models. There's a good opportunity to make just a semantic change to make an 'end run' around the problem.
The semantics of a relationship in Neo4j are expressed through
the mandatory TYPE you assign to the relationship, combined with
the direction you choose to point the mandatory arrow
The trick you solved with APOC was how to traverse a path of relationships that alternate between pointing forward and backward along the query's path. But before reaching for a power tool, why not just reverse the direction of either of your relationship types. You can change the model for allows from
<-[:allows]-
to
-[:is_allowed_by]->
and that buys you a lot. Now the directions of both relationships are the same and you can combine both relationships into a single relationship in the match pattern. And the path traversal can be expressed like this, short & sweet:
(u:User)-[:can|is_allowed_by*]->(c:Company)
That will literally go to all lengths to find every user-to-company path, branching included.

Neo4j stop searching an undirected path when given node is encountered

I have the following test data in Neo4j:
merge (n1:device {name:"n1"})-[:phys {name:"phys"}]->(:interface {name:"n1a"})-[:cable {name:"cable"}]->(:interface {name:"n2a"})-[:phys {name:"phys"}]->(n2:device {name:"n2"})
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1b"})-[:cable {name:"cable"}]->(:interface {name:"n2b"})-[:phys {name:"phys"}]->(n2)
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1c"})-[:cable {name:"cable"}]->(:interface {name:"n2c"})-[:phys {name:"phys"}]->(n2)
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1d"})-[:cable {name:"cable"}]->(:interface {name:"n2d"})
Giving:
While this example has exactly 3 relationships and 2 nodes on each of the 4 paths between each of n1 and n2, my real data could have many more, and also many more paths.
This is a undirected graph and in the real dataset, relationships on parts of each path are in either direction.
I know that every path starts at a :device and either just ends at a non :device or ends at a :device, and along the way there could be any number of relationships and other non :device nodes.
So I am looking to do:
match p=(:device {name:"n1"})-[*]-(:device) return (p)
and have it return the same, (I would be happy with double), number of records as:
match p=(:device {name:"n1"})-[*]->(:device) return (p)
So I am looking for a way to stop matching relationships and cease following the path when the first (:device) is encountered in the path.
From my limited understanding, I could easily achieve this by making every relationship bidirectional. However I have avoided that option to date as I have read it is bad practice.
Extra for experts :-)
Additionally, I would like a way to return any full paths that don't end at a :device (eg, the bottom one)
Thanks
This is a use case that is a little hard to do with just Cypher, as we don't have a way to specify "follow a variable-length path and stop when you reach another node of this type".
We can do something like this when we use LIMIT, but that becomes too restrictive when we don't know how many results there will be, or we need to do this for multiple starting nodes.
Because of this, there are some APOC path finder procedures that include more flexible options. One of these is a labelFilter option which lets you describe how to filter nodes with particular labels found during expansion (blacklisting, whitelisting, etc). One of these filters is called a termination filter (uses an / symbol before the appropriate label), which means to include the path to that node as a result, and stop expansion, which is exactly what you're looking for.
After you install APOC, you can use the apoc.path.expandConfig() procedure, starting from your start node, and supply the labelFilter config parameter to get this behavior:
MATCH (start:device {name:"n1"})
CALL apoc.path.expandConfig(start, {labelFilter:'/device'}) YIELD path
RETURN path

Cypher: Find any path between nodes

I have a neo4j graph that looks like this:
Nodes:
Blue Nodes: Account
Red Nodes: PhoneNumber
Green Nodes: Email
Graph design:
(:PhoneNumber) -[:PART_OF]->(:Account)
(:Email) -[:PART_OF]->(:Account)
The problem I am trying to solve is to
Find any path that exists between Account1 and Account2.
This is what I have tried so far with no success:
MATCH p=shortestPath((a1:Account {accId:'1234'})-[]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[:PART_OF]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[*]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=(a1:Account {accId:'1234'})<-[:PART_OF*1..100]-(n)-[:PART_OF]->(a2:Account {accId:'5678'}) RETURN p;
Same queries as above without the shortest path function call.
By looking at the graph I can see there is a path between these 2 nodes but none of my queries yield any result. I am sure this is a very simple query but being new to Cypher, I am having a hard time figuring out the right solution. Any help is appreciated.
Thanks.
All those queries are along the right lines, but need some tweaking to make work. In the longer term, though, to get a better system to easily search for connections between accounts, you'll probably want to refactor your graph.
Solution for Now: Making Your Query Work
The path between any two (n:Account) nodes in your graph is going to look something like this:
(a1:Account)<-[:PART_OF]-(:Email)-[:PART_OF]->(ai:Account)<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->(a2:Account)
Since you have only one type of relationship in your graph, the two nodes will thus be connected by an indeterminate number of patterns like the following:
<-[:PART_OF]-(:Email)-[:PART_OF]->
or
<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->
So, your two nodes will be connected through an indeterminate number of intermediate (:Account), (:Email), or (:PhoneNumber) nodes all connected by -[:PART_OF]- relationships of alternating direction. Unfortunately to my knowledge (and I'd love to be corrected here), using straight cypher you can't search for a repeated pattern like this in your current graph. So, you'll simply have to use an undirected search, to find nodes (a1:Account) and(a2:Account) connected through -[:PART_OF]- relationships. So, at first glance your query would look like this:
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*]-(a2:Account { accId: {a2_id} }))
RETURN *
(notice here I've used cypher parameters rather than the integers you put in the original post)
That's very similar to your query #3, but, like you said - it doesn't work. I'm guessing what happens is that it doesn't return a result, or returns an out of memory exception? The problem is that since your graph has circular paths in it, and that query will match a path of any length, the matching algorithm will literally go around in circles until it runs out of memory. So, you want to set a limit, like you have in query #4, but without the directions (which is why that query doesn't work).
So, let's set a limit. Your limit of 100 relationships is a little on the large side, especially in a cyclical graph (i.e., one with circles), and could potentially match in the region of 2^100 paths.
As a (very arbitrary) rule of thumb, any query with a potential undirected and unlabelled path length of more than 5 or 6 may begin to cause problems unless you're very careful with your graph design. In your example, it looks like these two nodes are connected via a path length of 8. We also know that for any two nodes, the given minimum path length will be two (i.e., two -[:PART_OF]- relationships, one into and one out of a node labelled either :Email or :PhoneNumber), and that any two accounts, if linked, will be linked via an even number of relationships.
So, ideally we'd set out our relationship length between 2 and 10. However, cypher's shortestPath() function only supports paths with a minimum length of either 0 or 1, so I've set it between 1 and 10 in the example below (even though we know that in reality, the shortest path have a length of at least two).
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*1..10]-(a2:Account { accId: {a2_id} }))
RETURN *
Hopefully, this will work with your use case, but remember, it may still be very memory intensive to run on a large graph.
Longer Term Solution: Refactor Graph and/or Use APOC
Depending on your use case, a better or longer term solution would be to refactor your graph to be more specific about relationships to speed up query times when you want to find accounts linked only by email or phone number - i.e. -[:ACCOUNT_HAS_EMAIL]- and -[:ACCOUNT_HAS_PHONE]-. You may then also want to use APOC's shortest path algorithms or path finder functions, which will most likely return a faster result than using cypher, and allow you to be more specific about relationship types as your graph expands to take in more data.

Cypher: Ordering Nodes on Same Level by Property on Relationship

I am new to Neo4j and currently playing with this tree structure:
The numbers in the yellow boxes are a property named order on the relationship CHILD_OF.
My goal was
a) to manage the sorting order of nodes at the same level through this property rather than through directed relationships (like e.g. LEFT, RIGHT or IS_NEXT_SIBLING, etc.).
b) being able to use plain integers instead of complete paths for the order property (i.e. not maintaining sth. like 0001.0001.0002).
I can't however find the right hint on how or if it is possible to recursively query the graph so that it keeps returning the nodes depth-first but for the sorting at each level consider the order property on the relationship.
I expect that if it is possible it might include matching the complete path iterating over it with the collection utilities of Cypher, but I am not even close enough to post some good starting point.
Question
What I'd expect from answers to this question is not necessarily a solution, but a hint on whether this is a bad approach that would perform badly anyways. In terms of Cypher I am interested if there is a practical solution to this.
I have a general idea on how I would tackle it as a Neo4j server plugin with the Java traversal or core api (which doesn't mean that it would perform well, but that's another topic), so this question really targets the design and Cypher aspect.
This might work:
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, extract(r in rels(path) | r.order) as orders
ORDER BY orders
if it complains about sorting arrays then computing a number where each digit (or two digits) are your order and order by that number
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, reduce(a=1, r in rels(path) | a*10+r.order) as orders
ORDER BY orders

implementing a 'greedy' match to find the extent of a subtree in Cypher

I have a graph that contains many 'subtrees' of items where an original item can be cloned which results in
(clone:Item)-[:clones]->(original:Item)
and a cloned item can also be cloned:
(newclone:Item)-[:clones]->(clone:Item)
the first item is created by a user:
(:User)-[:created]->(:item)
and the clones are collected by a user:
(:User)-[:collected]->(:item)
Given any item in the tree, I want to be able to match all the items in the tree. I'm using:
(1) match (known:Item)-[:clones*]-(others:Item)
My understanding is that this implements a 'greedy' match, traversing the tree in all directions, matching all items.
In general this works, however in some circumstances it doesn't seem to match all the items in the tree. For example, in the following query, this doesn't seem to be matching the whole subtree.
match p = (known:Item)-[r:clones*]-(others:Item) where not any(x in nodes(p) where (x)<-[:created]-(:User)) return p
Here I'm trying to find subtrees which are missing a 'created' Item (which were deleted in the source SQL database.
What I'm finding is that it giving me false positives because it's matching only part of a particular tree. For example, if there is a tree with 5 items structured properly as described above, it seems (in some cases) to be matching a subset of the tree (maybe 2 out of 5 items) and that subset doesn't contain the created card and so is returned by the query when I didn't expect it to.
Question
Is my logic correct or am I misunderstanding something? I'm suspecting that I'm misunderstanding paths, but I'm confused by the fact that the basic 'greedy' match works in most cases.
I think that my problem is that I've been confused because the query is finding multiple paths in the tree, some of which satisfy the test in the query and some don't. When viewed in the neo4j visualisation, the multiple paths are consolidated into what looks like the whole tree whereas the tabular results show that the match (1) above actually gives multiple paths.
I'm now thinking that I should be using collections rather than paths for this.
You are quite right that the query matches more paths than what is apparent in the browser visualization. The query is greedy in the sense that it has no upper bound for depth, but it also has no lower bound (well, strictly the lower bound is 1), which means it will emit a short path and a longer path that includes it if there are such. For data like
CREATE
(u)-[:CREATED]->(i)<-[:CLONES]-(c1)<-[:CLONES]-(c2)
the query will match paths
i<--c1
i<--c1<--c2
c1<--c2
c2-->c1
c2-->c1-->i
c1-->i
Of these paths, only the ones containing i will be filtered by the condition NOT x<-[:CREATED]-(), leaving paths
c1<--c2
c2-->c1
You need a further condition in your pattern before that filter, a condition such that each path that passes it should contain some node x where x<-[:CREATED]-(). That way that filter condition is unequivocal. Judging from the example model/data in your question, you could try matching all directed variable depth (clone)-[:CLONES]->(cloned) paths, where the last cloned does not itself clone anything. That last cloned should be a created item, so each path found can now be expected to contain a b<-[:CREATED]-(). That is, if created items don't clone anything, something like this should work
MATCH (a)-[:CLONES*]->(b)
WHERE NOT b-[:CLONES]->()
AND NOT b<-[:CREATED]-()
This relies on only matching paths where a particular node in each path can be expected to be created. An alternative is to work on each whole tree by itself by getting a single pointer into the tree, and test the entire tree for any created item node. Then the problem with your query could be said to be that it treats c1<--c2 as if it's a full tree and he solution is a pattern that only matches once for a tree. You can then collect the nodes of the tree with the variable depth match from there. You can get such a pointer in different ways, easiest is perhaps to provide a discriminating property to find a specific node and collect all the items in that node's tree. Perhaps something like
MATCH (i {prop:"val"})-[:CLONES*]-(c)
WITH i, collect(distinct c) as cc
WHERE NOT (
i<-[:CREATED]-() OR
ANY (c IN cc WHERE c<-[:CREATED]-()
) //etc
This is not a generic query, however, since it only works on the one tree of the one node. If you have a property pattern that is unique per tree, you can use that. You can also model your data so that each tree has exactly one relationship to a containing 'forest'.
MATCH (forest)-[:TREE]->(tree)-->(item)-[:CLONES*]-(c) // etc
If your [:COLLECTED] or some other relationship, or a combination of relationships and properties make a unique pattern per tree, these can also be used.

Resources