How can branches get pruned in a query? - neo4j

suppose I have a tree where certain nodes have a relationship of a give type, how would I return all nodes in the tree except for those with the given relationship and its descendants.
I've gotten half way there with something like this (the tree is built on has links):
match (root: {Name: 'Root'})-[:has*]->(n) where not (n)-[:Exemption]-() return n
but, of course, this excludes nodes that have a relationship of type Exemption but not its descendants, so the descendants show up as unconnected nodes
how do I structure the query?

This query should work:
MATCH p=({Name: 'Root'})-[:has*]->(n)
WHERE NONE(x IN NODES(p) WHERE (x)-[:Exemption]-())
RETURN n;
It filters out any :has relationship paths with a node that (also) has an :Exemption relationship.

I would probably split the path up into two parts:
MATCH p=(:Label {Name: 'Root'})-[:has*]->(n) WHERE NOT EXISTS ((n)-[:Exemption]->())
MATCH p2 = (n)-[:has*]->(m) WHERE NOT (m)-[:has]->()
RETURN p,p2;

after some experimentation, this worked:
match (root: {Name: 'Root'})-[:has*]->(n)
match (m)-[:has*]->(n)
where not (m)-[:Exemption]-()
return n
which essentially says: find a path, for any node on that path, traverse back up to make sure that none of the ancestors show an exception. so what I'm negating here is the subpath created with the second match

Related

Neo4j - Exclude node from results where it has a specific relationship

I'm attempting to a set of nodes (p) where they have a relationship [:INCLUDE] to a specific node (ca) identified by its ID, but I also want to make sure I exclude any (p) node that also has an [:EXCLUDE] relationship to any other (ca) node.
I've tried the below...
MATCH (a:CloudApp)-[]-(p:Policy{state: "enabled"})
WHERE (a{id:"All"})-[]-(p) OR (a{id:"b9a97804-0c6b-4d83-8b35-84bda7f8b69c"})-[]-(p)
WITH p,a
MATCH (p)-[]-(pl:Platform {id: "macOS"})
WHERE NOT (p)-[:EXCLUDE_Platform]-(pl)
WITH p,a,pl
RETURN *
Which gets me this...
And then tried to filter it with this...
MATCH (a:CloudApp)-[]-(p:Policy{state: "enabled"})
WHERE (a{id:"All"})-[]-(p) OR (a{id:"b9a97804-0c6b-4d83-8b35-84bda7f8b69c"})-[]-(p)
WITH p,a
MATCH (p)-[]-(pl:Platform {id: "macOS"})
WHERE NOT (p)-[:EXCLUDE_Platform]-(pl) AND NOT (p)-[:EXCLUDE_CLOUDAPP]-(a)
WITH p,a,pl
RETURN *
But this results in the same 3 (p) nodes and just excludes the (a) node where that relationship exists. I've tried a few variations on the above query and always seem to get the same result...
I'm guessing that this is because it just excludes that relationship and the node remains because it has another valid relationship. I'm just not sure how to achieve what I want?
Just remove the variable a from the condition NOT (p)-[:EXCLUDE_CLOUDAPP]-(a) and add a node label, and try this:
MATCH (a:CloudApp)-[]-(p:Policy{state: "enabled"})
WHERE (a{id:"All"})-[]-(p) OR (a{id:"b9a97804-0c6b-4d83-8b35-84bda7f8b69c"})-[]-(p)
WITH p,a
MATCH (p)-[]-(pl:Platform {id: "macOS"})
WHERE NOT (p)-[:EXCLUDE_Platform]-(pl) AND NOT (p)-[:EXCLUDE_CLOUDAPP]-(:CloudApp)
WITH p,a,pl
RETURN *
This basically checks whether the policy node is not linked to any CloudApp node via exclude relationship.
So the way I solved it was this
MATCH (a:CloudApp)-[]-(p:Policy{state: "enabled"})
WHERE (p)-[:EXCLUDE_CLOUDAPP]-(a)
WITH COLLECT(p) as excluded
MATCH (a:CloudApp)-[]-(p:Policy{state: "enabled"})
WHERE (a{id:"All"})-[]-(p:Policy{state: "enabled"})
WITH p,a,excluded
MATCH (p)-[]-(pl:Platform {id: "macOS"})
WHERE NOT (p)-[:EXCLUDE_Platform]-(pl)
WITH COLLECT(p) as included, excluded
RETURN [n IN included WHERE not n IN excluded] as list
I found the nodes I didn't want, then the ones I did want and removed the excluded ones from those using the WHERE not n IN part

Return type matched from multiple relationships in Neo4j with Cypher

I know that you can match on multiple relationships in Neo4j, like this example in the docs:
MATCH (wallstreet {title: 'Wall Street'})<-[:ACTED_IN|:DIRECTED]-(person)
RETURN person.name
which returns nodes with an ACTED_IN or DIRECTED relationship to 'Wall Street'.
However, is there a way to get the type of the relationship in this query? That is, I would like to return not only the name, but also which relationship applies to him/her, in order to see if it was the ACTED_IN, or the DIRECTED relationship that caused the result to be output.
You can do the equivalent here:
MATCH (:Person {name: 'Oliver Stone'})-[r]->(movie)
RETURN type(r)
but that's just matching on any relationship. I would like to do this, but only with the two relationships specified in the clause.
Thanks
You no longer need additional colons in between valid edge types you are querying. otherwise you can use the variable just like you did in the unspecific edge case:
MATCH (:Movie{title: 'The Matrix'})<-[r:ACTED_IN|DIRECTED]-(person)
RETURN type(r), person.name

In Neo4J return root nodes when having two types of relationships

I have a one direction tree as described as:
Node_C [is_a_method_of] Node_A
Node_D [is_related_to] Node_B
Node_C [is_related_to] Node_D
Node_A and Node_B are root nodes because they are not related or are a method of other nodes. How can I return them?
I saw in another post:
MATCH (n)
WHERE NOT (n)--()
RETURN n;
But that returns orphan nodes.
You would probably want to add some relationship type and direction to your query:
MATCH (n)
WHERE NOT (n)-[:is_a_method_of]->() AND NOT (n)-[:is_related_to]->()
RETURN n;
You haven't really specified the direction of relationships in your graph, so this is only my assumption. You could adapt this query to work on your graph schema if my assumptions are wrong.

Enforce order of relations in multipath queries

I'm looking into neo4j as a Graph database, and variable length path queries will be a very important use case. I now think I've found an example query that Cypher will not support.
The main issue is that I want to treat composed relations as a single relation. Let my give an example: finding co-actors. I've done this using the standard database of movies. The goal is to find all actors that have acted alongside Tom Hanks. This can be found with the query:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:ACTED_IN]-(a:Person) return a
Now, what if we want to find co-actors of co-actors recursively.
We can rewrite the above query to:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*2]-(a:Person) return a
And then it becomes clear we can do this with
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*]-(a:Person) return a
Notably, all odd-length paths are excluded because they do not end in a Person.
Now, I have found a query that I cannot figure out how to make recursive:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-(a:Person) return DISTINCT a
In words, all actors that have a director in common with Tom Hanks.
In order to make this recursive I tried:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN|DIRECTED*]-(a:Person) return DISTINCT a
However, (besides not seeming to complete at all). This will also capture co-actors.
That is, it will match paths of the form
()-[:ACTED_IN]->()<-[:ACTED_IN]-()
So what I am wondering is:
can we somehow restrict the order in which relations occur in a multi-path query?
Something like:
MATCH (tom {name: "Tom Hanks"}){-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-}*(a:Person) return DISTINCT a
Where the * applies to everything in the curly braces.
The path expander procs from APOC Procedures should help here, as we added the ability to express repeating sequences of labels, relationships, or both.
In this case, since you want to match on the actor of the pattern rather than the director (or any of the movies in the path), we need to specify which nodes in the path you want to return, which requires either using the labelFilter in addition to the relationshipFilter, or just to use the combined sequence config property to specify the alternating labels/relationships expected, and making sure we use an end node filter on the :Person node at the point in the pattern that you want.
Here's how you would do this after installing APOC:
MATCH (tom:Person {name: "Tom Hanks"})
CALL apoc.path.expandConfig(tom, {sequence:'>Person, ACTED_IN>, *, <DIRECTED, *, DIRECTED>, *, <ACTED_IN', maxLevel:12}) YIELD path
WITH last(nodes(path)) as person, min(length(path)) as distance
RETURN person.name
We would usually use subgraphNodes() for these, since it's efficient at expanding out and pruning paths to nodes we've already seen, but in this case, we want to keep the ability to revisit already visited nodes, as they may occur in further iterations of the sequence, so to get a correct answer we can't use this or any of the procs that use NODE_GLOBAL uniqueness.
Because of this, we need to guard against exploring too many paths, as the permutations of relationships to explore that fit the path will skyrocket, even after we've already found all distinct nodes possible. To avoid this, we'll have to add a maxLevel, so I'm using 12 in this case.
This procedure will also produce multiple paths to the same node, so we're going to get the minimum length of all paths to each node.
The sequence config property lets us specify alternating label and relationship type filterings for each step in the sequence, starting at the starting node. We are using an end node filter symbol, > before the first Person label (>Person) indicating that we only want paths to the Person node at this point in the sequence (as the first element in the sequence it will also be the last element in the sequence as it repeats). We use the wildcard * for the label filter of all other nodes, meaning the nodes are whitelisted and will be traversed no matter what their label is, but we don't want to return any paths to these nodes.
If you want to see all the actors who acted in movies directed by directors who directed Tom Hanks, but who have never acted with Tom, here is one way:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->(m)
MATCH (m)<-[:ACTED_IN]-(ignoredActor)
WITH COLLECT(DISTINCT m) AS ignoredMovies, COLLECT(DISTINCT ignoredActor) AS ignoredActors
UNWIND ignoredMovies AS movie
MATCH (movie)<-[:DIRECTED]-()-[:DIRECTED]->(m2)
WHERE NOT m2 IN ignoredMovies
MATCH (m2)<-[:ACTED_IN]-(a:Person)
WHERE NOT a IN ignoredActors
RETURN DISTINCT a
The top 2 MATCH clauses are deliberately not combined into one clause, so that the Tom Hanks node will be captured as an ignoredActor. (A MATCH clause filters out any result that use the same relationship twice.)

Combining depth- and breadth-first traversals in a single cypher query

My graph is a tree structure with root and end nodes, and a line of nodes between them with [:NEXT]-> relationships from one to the next. Some nodes along that path also have [:BRANCH]-> relationships to other root nodes, and through them to other lines of nodes.
What Cypher query will return an ordered list of the nodes on the path from beginning to end, with any BRANCH relationships being included with the records for the nodes that have them?
EDIT: It's not a technical diagram, but the basic structure looks like this:
with each node depicted as a black circle. In this case, I would would want every node depicted here.
How about
MATCH p=(root)-[:NEXT*0..]->(leaf)
OPTIONAL MATCH (leaf)-[:BRANCH]->(branched)
RETURN leaf, branched, length(p) as l
ORDER BY l ASC
see also this graph-gist: http://gist.neo4j.org/?9042990
This query - a bit slow - should work (I guess):
START n=node(startID), child=node(*)
MATCH (n)-[rels*]-(child)
WHERE all(r in rels WHERE type(r) IN ["NEXT", "BRANCH"])
RETURN *
That is based on Neo4j 2.0.x Cypher syntax.
Technically this query will stop at the end of the tree started from startID: that is because the end in the diagram above belongs to a single path, but not the end of all the branches.
I would also recommend to limit the cardinality of the relationships - [rels*1..n] - to prevent the query to go away...
You wont be able to control the order in which the nodes are returned as per the depth first or breadth first algo unless you have a variable to save previous element or kind of recursive call which I dont think is not possible using only Cypher.
What you can do
MATCH p =(n)-[:NEXT*]->(end)
WITH collect(p) as node_paths
MATCH (n1)-[:NEXT]->(m)-[:BRANCH]->(n2)
WITH collect(m) as branch_nodes , node_paths
RETURN branch_nodes,node_paths
Now node_paths consists of all the paths with pattern (node)-[:NEXT]->(node)-[:NEXT]->...(node) . Now you have the paths and branch Nodes(starting point of basically all the paths in the node_paths except the one which will be emerging from root node) , you can arrange the output order accordingly.

Resources