follow all relationships but specific ones - neo4j

How can I tell cypher to NOT follow a certain relationship/edge?
E.g. I have a :NODE that is connected to another :NODE via a :BUDDY relationship. Additionally every :NODE is related to :STUFF in arbitrary depth by arbitrary edges which are NOT of type :BUDDY. I now want to add a shortcut relation from each :NODE to its :STUFF. However, I do not include :STUFF of its :BUDDIES.
(:NODE)-[:BUDDY]->(:NODE)
(:NODE)-[*]->(:STUFF)
My current query looks like this:
MATCH (n:Node)-[*]->(s:STUFF) WHERE NOT (n)-[:BUDDY]->()-[*]->(s) CREATE (n)-[:HAS]->(s)
However I have some issues with this query:
1) If I ever add a :BUDDY relationship not directly between :NODE but children of :NODE the query will use that relationship for matching. This might not be intended as I do not want to include buddies at all.
2) Explain tells me that neo4j does the match (:NODE)-[*]->(:STUFF) and then AntiSemiApply the pattern (n)-[:BUDDY]->(). As a result it matches the whole graph to then unmatch most of the found connections. This seems ineffective and the query runs slower than I like (However subjective this might sound).
One (bad) fix is to restrict the depth of (:NODE)-[*]->(:STUFF) via (:NODE)-[*..XX]->(:STUFF). However, I cannot guarantee that depth unless I use a ridiculous high number for worst case scenarios.
I'd actually just like to tell neo4j to just not follow a certain relationship. E.g. MATCH (n:NODE)-[ALLBUT(:BUDDY)*]->(s:STUFF) CREATE (n)-[:HAS]->(s). How can I achieve this without having to enumerate all allowed connections and connect them with a | (which is really fast - but I have to manually keep track of all possible relations)?

One option for this particular schema is to explicitly traverse past the point where the BUDDY relationship is a concern, and then do all the unbounded traversing you like from there. Then you only have to apply the filter to single-step relationships:
MATCH (n:Node) -[r]-> (leaf)
WHERE NOT type(r) = 'BUDDY'
WITH n, leaf
MATCH (leaf) -[*] -> (s:Stuff)
WITH n, COLLECT(DISTINCT leaf) AS leaves, COLLECT(DISTINCT s) AS stuff
RETURN n, [leaf IN leaves WHERE leaf:Stuff] + stuff AS stuffs
The other option is to install apoc and take a look at the path expander procedure, which allows you to 'blacklist' node labels (such as :Node) from your path query, which may work just as well depending on your graph. See here.

My final solution is generating a string from the set of
{relations}\{relations_id_do_not_want}
and use that for matching. As I am using an API and thus can do this generation automatically it is not as much as an inconvenience as I feared, but still an inconvenience. However, this was the only solution I was able to find since posting this question.

You could use a condition o nrelationship type:
MATCH (:Node)-[r:REL]->(:OtherNode)
WHERE NOT type(r) = 'your_unwanted_rel_type'
I have no clues about perf though

Related

Cypher return multiple hops through pattern of relationships and nodes

I'm making a proof of concept access control system with neo4j at work, and I need some help with Cypher.
The data model is as follows:
(:User|Business)-[:can]->(:Permission)<-[:allows]-(:Business)
Now I want to get a path from a User or a Business to all the Business-nodes that you can reach trough the
-[:can]->(:Permission)<-[:allows]-
pattern. I have managed to write a MATCH that gets me halfway there:
MATCH
path =
(:User {userId: 'e96cca53-475c-4534-9fe1-06671909fa93'})-[:can|allows*]-(b:Business)
but this doesn't have any directions, and I can't figure out how to include the directions without reducing the returned matches to only the direct matches (i.e it doesn't continue after the first hit on a :Business node)
So what I'm wondering is:
Is there a way to match multiple of these hops in one query?
Should I model this entirely different?
Am I on the wrong path completely and the query should be completely
rewritten
Currently the syntax of variable-length expansions doesn't allow fine control for separate directions for certain types. There are improvements in the pipeline around this, but for the moment Cypher alone won't get you what you want.
We can use APOC Procedures for this, as fine control of the direction of types in the expansion, and sequences of relationships, are supported in the path expander procs.
First, though, you'll need to figure out how to address your user-or-business match, either by adding a common label to these nodes by which you can MATCH either type by property, or you can use a subquery with two UNIONed queries, one for :Business nodes, the other for :User nodes, that way you can still take advantage of an index on either, and get possible results in a single variable.
Once you've got that, you can use apoc.path.expandConfig(), passing some options to get what you want:
// assume you've matched to your `start` node already
CALL apoc.path.expandConfig(start, {relationshipFilter:'can>|<allows', labelFilter:'>Business'}) YIELD path
RETURN path
This one doesn't use sequences, but it does restrict the direction of expansion per relationship type. We are also setting the labelFilter such that :Business nodes are the end node of the path and not nodes of any other label.
You can specify the path as follows:
MATCH path = (:User {userId: $id})-[:can]->(:Permission)
<-[:allows]-(:Business))
RETURN path
This should return the results you're after.
I see a good solution has been provided via path expanding APOC procedures.
But I'll focus on your point #2: "Should I model this entirely differently?"
Well, not entirely but I think yes.
The really liberating part of working with Neo4j is that you can change the road you are driving over as easily as you can change your driving strategy: model vs query. And since you are at an early stage in your project, you can experiment with different models. There's a good opportunity to make just a semantic change to make an 'end run' around the problem.
The semantics of a relationship in Neo4j are expressed through
the mandatory TYPE you assign to the relationship, combined with
the direction you choose to point the mandatory arrow
The trick you solved with APOC was how to traverse a path of relationships that alternate between pointing forward and backward along the query's path. But before reaching for a power tool, why not just reverse the direction of either of your relationship types. You can change the model for allows from
<-[:allows]-
to
-[:is_allowed_by]->
and that buys you a lot. Now the directions of both relationships are the same and you can combine both relationships into a single relationship in the match pattern. And the path traversal can be expressed like this, short & sweet:
(u:User)-[:can|is_allowed_by*]->(c:Company)
That will literally go to all lengths to find every user-to-company path, branching included.

Neo4j stop searching an undirected path when given node is encountered

I have the following test data in Neo4j:
merge (n1:device {name:"n1"})-[:phys {name:"phys"}]->(:interface {name:"n1a"})-[:cable {name:"cable"}]->(:interface {name:"n2a"})-[:phys {name:"phys"}]->(n2:device {name:"n2"})
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1b"})-[:cable {name:"cable"}]->(:interface {name:"n2b"})-[:phys {name:"phys"}]->(n2)
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1c"})-[:cable {name:"cable"}]->(:interface {name:"n2c"})-[:phys {name:"phys"}]->(n2)
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1d"})-[:cable {name:"cable"}]->(:interface {name:"n2d"})
Giving:
While this example has exactly 3 relationships and 2 nodes on each of the 4 paths between each of n1 and n2, my real data could have many more, and also many more paths.
This is a undirected graph and in the real dataset, relationships on parts of each path are in either direction.
I know that every path starts at a :device and either just ends at a non :device or ends at a :device, and along the way there could be any number of relationships and other non :device nodes.
So I am looking to do:
match p=(:device {name:"n1"})-[*]-(:device) return (p)
and have it return the same, (I would be happy with double), number of records as:
match p=(:device {name:"n1"})-[*]->(:device) return (p)
So I am looking for a way to stop matching relationships and cease following the path when the first (:device) is encountered in the path.
From my limited understanding, I could easily achieve this by making every relationship bidirectional. However I have avoided that option to date as I have read it is bad practice.
Extra for experts :-)
Additionally, I would like a way to return any full paths that don't end at a :device (eg, the bottom one)
Thanks
This is a use case that is a little hard to do with just Cypher, as we don't have a way to specify "follow a variable-length path and stop when you reach another node of this type".
We can do something like this when we use LIMIT, but that becomes too restrictive when we don't know how many results there will be, or we need to do this for multiple starting nodes.
Because of this, there are some APOC path finder procedures that include more flexible options. One of these is a labelFilter option which lets you describe how to filter nodes with particular labels found during expansion (blacklisting, whitelisting, etc). One of these filters is called a termination filter (uses an / symbol before the appropriate label), which means to include the path to that node as a result, and stop expansion, which is exactly what you're looking for.
After you install APOC, you can use the apoc.path.expandConfig() procedure, starting from your start node, and supply the labelFilter config parameter to get this behavior:
MATCH (start:device {name:"n1"})
CALL apoc.path.expandConfig(start, {labelFilter:'/device'}) YIELD path
RETURN path

Cypher: retrieve all attached nodes of more than one type?

Beginner Cypher question. I know how to get all the nodes of a particular type attached to a particular person in my database. Here I am retrieving all the friends of a particular person, within 10 hops:
MATCH (rebecca:Person {name:"Rebecca"})-[r*1..10]->(friends:Friend)
RETURN rebecca, friends
But how would I extend this to get nodes of two types: either the friends, or the neighbours, of Rebecca?
You can filter on the label of the friends identifier :
MATCH (rebecca:Person {name:"Rebecca"})-[r*1..10]->(other)
WHERE ALL( x IN ["Friend","Neighbour"] WHERE x IN labels(other) )
RETURN rebecca, other
NB: The answer from InverseFalcon is perfectly valid, here it is just another way to do this filter.
Note that this is not really ideal, FRIEND and NEIGHBOUR are semantically best described as relationships and you can see here that when
going away from the natural way of thinking as a graph (relationships matters!) you suffer from it in your queries.
There isn't an OR we can use on the label in the MATCH itself, so you may have to filter with a WHERE clause:
MATCH (rebecca:Person {name:"Rebecca"})-[r*1..10]->(friendOrNeighbor)
WHERE friendOrNeighbor:Friend or friendOrNeighbor:Neighbor
RETURN DISTINCT rebecca, friendOrNeighbor
Keep in mind variable-length relationship matches like this are meant to find all possible paths up to the given max limit, so this is actually doing extra work that you may not need, that may be slow if there are many relationships within that local graph.
You may want to consider apoc.path.expandConfig() from APOC Procedures. If you use 'NODE_GLOBAL' for uniqueness, and specify the upper bound with maxLevel: 10, it's a much more efficient means of getting the nodes you want faster.

Back-reference neo4j

Can I use some back-reference sort of mechanism in neo4j? I'm not interested in what matches the query, just that it is the same thing in many places. Something like:
MATCH (a:Event {diagnosis1:11})
MATCH (b:Event {diagnosis1:15})
MATCH (c:Event {diagnosis1:5})
MATCH (a)-[rel:Next {PatientID:*}]->(b)
MATCH (b)-[rel1:Next {PatientID:\{1}]->(c)
The idea is that I just require the attribute IDs from both edges to be the same, without specifying it. The whole purpose of it would be not generating all possible matchings, to then filter them, but only hop in the specific places.
I've asked something similar in a more specific way here.
Edit: I know WHERE clauses can be used for that, but they filter the query AFTER matching the edges and nodes. I want to do that DURING the matching!
Use a WHERE clause with simple references, there's no need for back references:
MATCH (a)-[rel:Next]->(b)
MATCH (b)-[rel1:Next]->(c)
WHERE rel.PatientID = rel1.PatientID
Update
First of all, Cypher is a declarative query language: you express what you want, the runtime takes care of executing and optimizing it any way it can, so it's not that obvious that it would do it the way you think it will, or that using "back references" would magically solve the problem; it's just another way of writing the same thing.
So, your problem is that the match creates all the relationship pairs before filtering them. How about splitting the match in 2 phases using WITH?
MATCH (a:Event {diagnosis1:11})-[rel:Next]->(b:Event {diagnosis1:15})
WITH a, b, rel
MATCH (b)-[rel1:Next]->(c:Event {diagnosis1:5})
WHERE rel1.PatientID = rel.PatientID
That should only select the second relationships that match the first, but I'm not sure if it's an O(n^2) algorithm in Cypher's runtime.
Otherwise, if you drop to the Java API (which would mean either an extension or a procedure, depending on your version of Neo4j), you can probably implement in O(n) by
scanning all the relationships between a and b, indexing them by PatientID in some multimap (see Guava, or use a Map<K, Collection<V>>); this is O(n)
then doing the same for all the relationships between b and c, still O(n)
iterate on the keys of one multimap to get the values in both and match them, still O(n)

How can I mitigate having bidirectional relationships in a family tree, in Neo4j?

I am running into this wall regarding bidirectional relationships.
Say I am attempting to create a graph that represents a family tree. The problem here is that:
* Timmy can be Suzie's brother, but
* Suzie can not be Timmy's brother.
Thus, it becomes necessary to model this in 2 directions:
(Sure, technically I could say SIBLING_TO and leave only one edge...what I'm not sure what the vocabulary is when I try to connect a grandma to a grandson.)
When it's all said and done, I pretty sure there's no way around the fact that the direction matters in this example.
I was reading this blog post, regarding common Neo4j mistakes. The author states that this bidirectionality is not the most efficient way to model data in Neo4j and should be avoided.
And I am starting to agree. I set up a mock set of 2 families:
and I found that a lot of queries I was attempting to run were going very, very slow. This is because of the 'all connected to all' nature of the graph, at least within each respective family.
My question is this:
1) Am I correct to say that bidirectionality is not ideal?
2) If so, is my example of a family tree representable in any other way...and what is the 'best practice' in the many situations where my problem may occur?
3) If it is not possible to represent the family tree in another way, is it technically possible to still write queries in some manner that gets around the problem of 1) ?
Thanks for reading this and for your thoughts.
Storing redundant information (your bidirectional relationships) in a DB is never a good idea. Here is a better way to represent a family tree.
To indicate "siblingness", you only need a single relationship type, say SIBLING_OF, and you only need to have a single such relationship between 2 sibling nodes.
To indicate ancestry, you only need a single relationship type, say CHILD_OF, and you only need to have a single such relationship between a child to each of its parents.
You should also have a node label for each person, say Person. And each person should have a unique ID property (say, id), and some sort of property indicating gender (say, a boolean isMale).
With this very simple data model, here are some sample queries:
To find Person 123's sisters (note that the pattern does not specify a relationship direction):
MATCH (p:Person {id: 123})-[:SIBLING_OF]-(sister:Person {isMale: false})
RETURN sister;
To find Person 123's grandfathers (note that this pattern specifies that matching paths must have a depth of 2):
MATCH (p:Person {id: 123})-[:CHILD_OF*2..2]->(gf:Person {isMale: true})
RETURN gf;
To find Person 123's great-grandchildren:
MATCH (p:Person {id: 123})<-[:CHILD_OF*3..3]-(ggc:Person)
RETURN ggc;
To find Person 123's maternal uncles:
MATCH (p:Person {id: 123})-[:CHILD_OF]->(:Person {isMale: false})-[:SIBLING_OF]-(maternalUncle:Person {isMale: true})
RETURN maternalUncle;
I'm not sure if you are aware that it's possible to query bidirectionally (that is, to ignore the direction). So you can do:
MATCH (a)-[:SIBLING_OF]-(b)
and since I'm not matching a direction it will match both ways. This is how I would suggest modeling things.
Generally you only want to make multiple relationships if you actually want to store different state. For example a KNOWS relationship could only apply one way because person A might know person B, but B might not know A. Similarly, you might have a LIKES relationship with a value property showing how much A like B, and there might be different strengths of "liking" in the two directions

Resources