Find path between multiple nodes by property - neo4j

So I am trying to find all the nodes between a node of my choosing and a property, called Stop:true. Note: I would like for it to include the stop node.
So If I have a set of nodes like this:
(id:1,Stop:false)-(id:2,Stop:false)-(id:3,Stop:false)-(id:4,Stop:false)-
(id:5,Stop:false)-(id:6,Stop:True)-(id:7,Stop:false)-(id:8,Stop:false)
It would return
(id:1,Stop:false)-(id:2,Stop:false)-(id:3,Stop:false)-(id:4,Stop:false)-
(id:5,Stop:false)-(id:6,Stop:True)
So far I have
MATCH p=(a:Node{id:1})-[*]-(b:Node)
WHERE NOT b.Stop = true
RETURN p
But this query still returns nodes that are connected to the stop node. How do I make it show ALL the nodes up to the stop node?

The following query should return every path from your chosen node to a Stop node (i.e.., a node that has a Stop value of true). If your DB has paths with multiple Stop nodes, then this query would return a path to each Stop node (meaning that some of the returned paths could contain multiple Stop nodes).
MATCH p=(a:Node{id:6})-[*]-(b:Node {Stop: true})
RETURN p;
However, if you only want paths that have a single Stop node (at the end), then this query should work:
MATCH p=(a:Node{id:6})-[*]-(b:Node {Stop: true})
WHERE NONE(n IN NODES(p)[1..-1] WHERE n.Stop)
RETURN p;
[NOTE]
Variable-length path patterns (like ()-[*]-()) have exponential time and space complexity. If the average degree of a node is X, then traversing a variable-length path to depth Y imposes a complexity of O(X^Y). You would normally need to specify a reasonable upper bound on variable-length patterns (e.g., ()-[*..5]-()) to avoid running out of memory or having the query take seemingly forever to run. The upper bound you specify would depend on the nature of your query and your actual data characteristics.

Related

Finding all Leaf Nodes in Neo4j efficiently

I am trying to write a query in Cypher that returns all leaf nodes given a specific root node.
Right now I have been using:
MATCH (root:Node {name: 'Name'})<-[:REL *]-(leaf:Node)
WHERE NOT (leaf)<-[:REL]-()
RETURN leaf
The problem with this query is that as the database becomes larger, it becomes exponentially slower because every single possible leaf node that connects to my root is checked in the not clause. To omit the not clause, I can return the entire path like this:
MATCH p=(root:Node {name: 'Name'})<-[:REL *]-(leaf:Node)
RETURN p
The second query is a lot faster as the number of nodes/relationships in the graph increases, but I would prefer to return just the leaf nodes instead of the path.
Is there a way to run this query more efficiently on a larger data set?
have you tried this?
MATCH (root:Node {name: 'Name'})<-[:REL *]-(leaf:Node)
WITH leaf
WHERE NOT (leaf)<-[:REL]-()
RETURN leaf
On a similar case for me it gives the lowest number of db hits.

Getting a subgraph not crossing through a certain node

I have a huge (non cyclic) graph and want to find all nodes reachable by relation X from a given node. However, I do not want to cross a node having a certain attribute {attr:'donotcross'} as this represents a choke point I do not want to cross (i.e. this is the only node leading to an adjacent subgraph).
Currently I do breadth first search myself using a trivial Cypher query to isolate neighboring nodes and python, stopping the recursion as soon as I reach that specific node. This, however, is really slow and I think that using pure Cypher to isolate those nodes could be faster.
What does the Cypher query look like returning all connected nodes via X but not traversing a node with property attr:'donotcross'?
My intuition would be
MATCH (n)-[:X*]->(inter)-[:X*]->(m) WHERE NOT inter.attr = 'donotcross' RETURN m
With n being the start node. However, this does not work as this pattern can match a path with a forbidden node if there are more than the forbidden node in between the start and target node.
Using Cypher alone, you can use the following approach:
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE inter.attr = 'donotcross')
RETURN DISTINCT m
Keep in mind you should at least be using labels for your starting node n, if you aren't able to look them up by an indexed property for a specific label.
Also, if there are relatively few of these donotcross nodes, and if there is an index on the label of these nodes on attr, then it may be faster to first match on these nodes, collect them, then filter based on that:
MATCH (x) // best to use a label and index lookup!
WHERE x.attr = 'donotcross'
WITH collect(x) as excluded
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE node in excluded)
RETURN DISTINCT m

Leaf nodes and paths in cypher

I have a graph in Neo4J that looks like this:
(a {flag:any})<- (many, 0 or more) <-(b {flag:true})<- (many, 0 or more) <-(c {flag: any})
-OR-
(a {flag:any})<- (many, 0 or more) <-(d)
-OR-
(a {flag:any})
Where a, b, c, and d all have the same type, and the relations are also the same. All the nodes have flag:false except where noted. Of course the real graph is a tree, not a vine.
In short, every path should begin with a and end with the first flag=true node, or should begin with a and get all children down to the leaf of the tree. Per the last example, a doesn't have to have any children - it can be a root and a leaf. Finally, in the first case, we'll never pull in c. b stops the traversal.
How can I write this query?
I have gotten it to work with a path and several unwind/collect statements that are basically horse****, lol. I want a better query, but I am so confused now it is not going to happen.
The following query should return all 3 kinds of paths. I assume that all relevant nodes are labeled Foo, and all relevant relationships have the BAR type.
The first term of the WHERE clause looks for paths (of length 0 or more, because of the variable-length relationship pattern used in the MATCH clasue) that end in a node with a true flag with no true flags earlier in the path (except for possibly the starting node). The second term looks for paths (of length 0 or more) ending with a leaf node, where no nodes (except for possibly the starting node) have a true flag.
MATCH p=(a:Foo)<-[:BAR*0..]-(b:Foo)
WHERE
(b.flag AND NONE(x IN NODES(p)[1..-1] WHERE x.flag)) OR
((NOT (b)<-[:BAR]-()) AND NONE(y IN NODES(p)[1..] WHERE y.flag))
RETURN p;
NOTE: Variable-length relationship patterns with no upper bound (like [:BAR*0..]) can be very expensive, and can take a very long time or cause an out of memory error. So, you may need to specify a reasonable upper bound (for example, [:BAR*0..5]).
I would approach this query as the UNION of the two cases:
MATCH shortestPath((a)<-[:REL_TYPE*1..]-(end:Label {flag: true}))
RETURN a, end
UNION
MATCH (a)<-[:REL_TYPE*0..]-(end:Label)
WHERE NOT (end)<-[:REL_TYPE]-()
RETURN a, end
Let's break it down:
To express that we only want to traverse until the first flag is true, we use shortestPath.
To express that we want to traverse down to the leaf, we use the following formalisation: a node is a leaf if it has no relationships that could be continued, captured by a WHERE NOT filter on patterns.
This should give an idea of the basic ideas to use for such queries -- please provide some feedback so that I can refine the answer.

Using Cypher how would one select all nodes connected to a node within exactly one hop whilst excluding the central node from the result?

Take the above image as an example. Using Cypher, how would I match all of the nodes except for the longest chain and the central node? I.e. all nodes within exactly one hop of the central node whilst excluding the central node (all nodes and edges except 3 nodes and 2 edges).
I have tried the following:
MATCH (n:Node) WHERE n.id = "123" MATCH path = (m)-[*1..1]->(n) RETURN m
This very nearly works, however it still returns the central node (i.e. node n). How would I exclude this node from my query result?
[UPDATED]
This will return all distinct nodes directly connected to the specified node, and explicitly prevents the specified node from being returned (in case it has a relationship to itself):
MATCH (n:Node)--(m)
WHERE n.id = "123" AND n <> m
RETURN DISTINCT m;
Ideally I would have liked to match the nodes as mentioned in my question and delete them. However, as I have not found a way to do so an inverse approach can be utilised whereby all nodes but those as mentioned in the question are matched instead. Thereby effectively excluding (but not deleting) the unwanted nodes.
This can be achieved using this query:
MATCH (n:Node) WHERE n.id = "123" MATCH path = (m)-[*2..]->(n) RETURN path
This returns the central node and all paths to that node that have a "length" greater than or equal to 2.

Neo4j deep hierarchy query

I've this kind of data model in the db:
(a)<-[:has_parent]<-(b)-[:has_parent]-(c)<-[:has_parent]-(...)
every parent can have multiple children & this can go on to unknown number of levels.
I want to find these values for every node
the number of descendants it has
the depth [distance from the node] of every descendant
the creation time of every descendant
& I want to rank the returned nodes based on these values. Right now, with no optimization, the query runs very slow (especially when the number of descendants increases).
The Questions:
what can I do in the model to make the query performant (indexing, data structure, ...)
what can I do in the query
what can I do anywhere else?
edit:
the query starts from a specific node using START or MATCH
to clarify:
a. the query may start from any point in the hierarchy, not just the root node
b. every node under the starting node is returned ranked by the total number of descendants it has, the distance (from the returned node) of every descendant & timestamp of every descendant it has.
c. by descendant I mean everything under it, not just it's direct children
for example,
here's a sample graph:
http://console.neo4j.org/r/awk6m2
First you need to know how to find the root node. The following statement finds the nodes having no outboung parent relationship - be aware that statement is potentially expensive in a large graph.
MATCH (n)
WHERE NOT ((n)-[:has_parent]->())
RETURN n
Instead you should use an index to find that node:
MATCH (n:Node {name:'abc'})
Starting with our root node, we traverse inbound parent relationship with variable depth. On each node traversed we calculate the number of children - since this might be zero a OPTIONAL MATCH is used:
MATCH (root:Node) // line 1-3 to find root node, replace by index lookup
WHERE NOT ((root)-[:has_parent]->())
WITH root
MATCH p =(root)<-[:has_parent*]-() // variable path length match
WITH last(nodes(p)) AS currentNode, length(p) AS currentDepth
OPTIONAL MATCH (currentNode)<-[:has_parent]-(c) // tranverse children
RETURN currentNode, currentNode.created, currentDepth, count(c) AS countChildren

Resources