Neo4j: Return only root nodes from possible child paths - neo4j

I have a set of (n) values which all have corresponding nodes in my graph. I start with unknown relationships to each other. (see start nodes in blue)
I want to find, as simply as possible, is if any of the value/nodes are children of any of the others then applying these rules to filter the results:
If the node is a child then discard it. (white nodes)
If the node is a root then return it. (green nodes)
If the node does not have any children also return it. (green node 673)
There can be up to 50 starting nodes. I've tried iterating through them comparing two at a time discarding them if they are a child - but the number of iterations quickly gets out of hand in larger sets. I'm hoping there is some graph magic I've overlooked. Cypher please!
Thanks!

Let's say that you have an input parameter nids - set of values for the id property of node, target nodes have the label Node, the relationship between nodes is of type hasChild.
Then you need to find such nodes corresponding to the input set, and which do not have parents from the nodes corresponding to the input set:
UNWIND {nids} as nid
MATCH (N:Node {id: nid})
OPTIONAL MATCH (N:Node {id: nid})<-[:hasChild]-(P:Node) WHERE P.id IN {nids}
WITH N, collect(P) AS ts WHERE size(ts) = 0
RETURN N
And do not forget to add an index to the id property for the node:
CREATE INDEX ON :Node(id)

Related

How can I Limit nodes on each level of children in tree with cypher query

I'm using cypher and neo4j
I have a big dataset of parent and child relations as
(:Person)-[:PARENT_OF*]-(:Person)
I need to get the family tree with only 5 children(nodes) on each level of the tree
I've tried:
MATCH path = (jon:Person )-[:PARENT_OF*]-(:Person)
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) yield value
RETURN value;
its returning me the whole tree structure, I've tried limiting the nodes with limit but it isn't working properly
I guess you have to filter out the paths first. My approach would be to make sure that all children in the path are among the first 5 child nodes of the previous parent . I don't have a dataset ready to test it, but it could be along the lines of this
MATCH path = (jon:Person )-[:PARENT_OF*]->(leaf:Person)
// limit the number of paths, by only considering the ones that are not parent of someone else.
WHERE NOT (leaf)-[:PARENT_OF]->(:Person)
// and all nodes in the path (except the first one, the root) be in the first five children of the parent
AND
ALL(child in nodes(path)[1..] WHERE child IN [(sibling)<-[:PARENT_OF]-(parent)-[:PARENT_OF]->(child) | sibling][..5])
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) yield value
RETURN value
another approach, perhaps faster, would be to first collect all the first five children of descendants of jon
// find all sets of "firstFiveChildren"
MATCH (jon:Person { name:'jon'}),
(p:Person)-[:PARENT_OF]->(child)
WHERE EXISTS((jon)-[:PARENT_OF*]->(p))
WITH jon,p,COLLECT(child)[..5] AS firstFiveChildren
// create a unique list of the persons that could be part of the tree
WITH jon,apoc.coll.toSet(
apoc.coll.flatten(
[jon]+COLLECT(firstFiveChildren)
)
) AS personsInTree
MATCH path = (jon)-[:PARENT_OF*]->(leaf:Person)
WHERE NOT (leaf)-[:PARENT_OF]->(:Person)
AND ALL(node in nodes(path) WHERE node IN personsInTree)
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) yield value
RETURN value;
UPDATE
The issue with the data is that the tree is not symmetric, e.g. not all the paths have the same depth. node d0 for instance has no children. So if you pick five children at the first level, you may not be getting any deeper.
I added a slightly different approach, that should work with symmetric trees, and which allows you to set the number max number of children per node. Try it with 3, and you will see that you only get nodes from the first level., with 8 you get more.
// find all sets of "firstChildren"
WITH 8 AS numberChildren
MATCH (jon:Person { name:'00'}),
(p:Person)-[:PARENT_OF]->(child)
WHERE EXISTS((jon)-[:PARENT_OF*0..]->(p))
WITH jon,p,COLLECT(child)[..numberChildren] AS firstChildren
// create a unique list of the persons that could be part of the tree
WITH jon,apoc.coll.toSet(
apoc.coll.flatten(
[jon]+COLLECT(firstChildren)
)
) AS personsInTree
MATCH path = (jon)-[:PARENT_OF*]->(leaf:Person)
WHERE NOT (leaf)-[:PARENT_OF]->(:Person)
AND ALL(node in nodes(path) WHERE node IN personsInTree)
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) yield value
RETURN value

Neo4J: How can I find if a path traversing multiple nodes given in a list exist?

I have a graph of nodes with a relationship NEXT with 2 properties sequence (s) and position (p). For example:
N1-[NEXT{s:1, p:2}]-> N2-[NEXT{s:1, p:3}]-> N3-[NEXT{s:1, p:4}]-> N4
A node N might have multiple outgoing Next relationships with different property values.
Given a list of node names, e.g. [N2,N3,N4] representing a sequential path, I want to check if the graph contains the nodes and that the nodes are connected with relationship Next in order.
For example, if the list contains [N2,N3,N4], then check if there is a relationship Next between nodes N2,N3 and between N3,N4.
In addition, I want to make sure that the nodes are part of the same sequence, thus the property s is the same for each relationship Next. To ensure that the order maintained, I need to verify if the property p is incremental. Meaning, the value of p in the relationship between N2 -> N3 is 3 and the value p between N3->N4 is (3+1) = 4 and so on.
I tried using APOC to retrieve the possible paths from an initial node N using python (library: neo4jrestclient) and then process the paths manually to check if a sequence exists using the following query:
q = "MATCH (n:Node) WHERE n.name = 'N' CALL apoc.path.expandConfig(n {relationshipFilter:'NEXT>', maxLevel:4}) YIELD path RETURN path"
results = db.query(q,data_contents=True)
However, running the query took some time that I eventually stopped the query. Any ideas?
This one is a bit tough.
First, pre-match to the nodes in the path. We can use the collected nodes here to be a whitelist for nodes in the path
Assuming the start node is included in the list, a query might go like:
UNWIND $names as name
MATCH (n:Node {name:name})
WITH collect(n) as nodes
WITH nodes, nodes[0] as start, tail(nodes) as tail, size(nodes)-1 as depth
CALL apoc.path.expandConfig(start, {whitelistNodes:nodes, minLevel:depth, maxLevel:depth, relationshipFilter:'NEXT>'}) YIELD path
WHERE all(index in range(0, size(nodes)-1) WHERE nodes[index] = nodes(path)[index])
// we now have only paths with the given nodes in order
WITH path, relationships(path)[0].s as sequence
WHERE all(rel in tail(relationships(path)) WHERE rel.s = sequence)
// now each path only has relationships of common sequence
WITH path, apoc.coll.pairsMin([rel in relationships(path) | rel.p]) as pairs
WHERE all(pair in pairs WHERE pair[0] + 1 = pair[1])
RETURN path

Using Cypher how would one select all nodes connected to a node within exactly one hop whilst excluding the central node from the result?

Take the above image as an example. Using Cypher, how would I match all of the nodes except for the longest chain and the central node? I.e. all nodes within exactly one hop of the central node whilst excluding the central node (all nodes and edges except 3 nodes and 2 edges).
I have tried the following:
MATCH (n:Node) WHERE n.id = "123" MATCH path = (m)-[*1..1]->(n) RETURN m
This very nearly works, however it still returns the central node (i.e. node n). How would I exclude this node from my query result?
[UPDATED]
This will return all distinct nodes directly connected to the specified node, and explicitly prevents the specified node from being returned (in case it has a relationship to itself):
MATCH (n:Node)--(m)
WHERE n.id = "123" AND n <> m
RETURN DISTINCT m;
Ideally I would have liked to match the nodes as mentioned in my question and delete them. However, as I have not found a way to do so an inverse approach can be utilised whereby all nodes but those as mentioned in the question are matched instead. Thereby effectively excluding (but not deleting) the unwanted nodes.
This can be achieved using this query:
MATCH (n:Node) WHERE n.id = "123" MATCH path = (m)-[*2..]->(n) RETURN path
This returns the central node and all paths to that node that have a "length" greater than or equal to 2.

Cypher Query To Ensure All Nodes are in path but other paths can exist

I have a graph but need to be sure that all nodes are in the path (but more than those nodes can exist in the path).
Here is an example (sorry, had to black some stuff out):
I want to find end2 and not end1 when I have the values of the same property in all three intermediary nodes in the list I pass in. But I can't get a query that will return end2 without end1. There can be more nodes out there that have the same routes but I will only every pass in distinct values that are not duplicated across the middle nodes. Anyone know of a query that will give me only the end node that has all the values from the intermediary nodes? There are also nodes that hang off of those end nodes and some of them interlink between end1 and end2. Some others do not and those are the nodes I do not want but because there is a path between the yellow and blue to that end1 I can't use ANY but because there other paths to those same nodes (not pictured) I can't use ALL either.
Thanks in advance for help.
[Update]
Here is the current query I use but it only allows for one "end" node per a start node and I want multiple. I needed that id(eg)={eg_id} passed in but that limits it to one. I would much rather use the fact that every a in the path below needs to match up to the list of name properties in the middle node must be there to get to which end node. So if yellow and blue are htere then end1 and end2 would come back but if yellow, blue and purple are there then only end2 would come back.
start td = node({td_id})
match (td:Start)-[:Rel1]->(a)<-[:Rel2]-(eg:End)-[es:Rel3]->(n:WhatsPastEnd)
with collect(a.name) as pnl, n, td, eg, es
where id(eg) = {eg_id}
and all(param_needs in {param_name_list} where param_needs in pnl)
return n
order by es.order
[SOLVED]
Thank you very much InverseFalcon, got what I need from the solution below!
Okay, let's modify your query, and drop the matching of the endnode id.
start td = node({td_id})
// unwind your list of names so every row has a name
with td, {param_name_list} as param_names
unwind param_names as param_name
match (td:Start)-[:Rel1]->(a)
where a.name = param_name
// now a has all nodes with the required names
// collect then unwind so we have the full collection for each a
with collect(a) as requiredNodes
unwind requiredNodes as a
match (a)<-[:Rel2]-(eg:End)
where all(node in requiredNodes where (node)<-[:Rel2]-(eg))
with eg
match (eg)-[es:Rel3]->(n:WhatsPastEnd)
return n
order by es.order

Neo4j deep hierarchy query

I've this kind of data model in the db:
(a)<-[:has_parent]<-(b)-[:has_parent]-(c)<-[:has_parent]-(...)
every parent can have multiple children & this can go on to unknown number of levels.
I want to find these values for every node
the number of descendants it has
the depth [distance from the node] of every descendant
the creation time of every descendant
& I want to rank the returned nodes based on these values. Right now, with no optimization, the query runs very slow (especially when the number of descendants increases).
The Questions:
what can I do in the model to make the query performant (indexing, data structure, ...)
what can I do in the query
what can I do anywhere else?
edit:
the query starts from a specific node using START or MATCH
to clarify:
a. the query may start from any point in the hierarchy, not just the root node
b. every node under the starting node is returned ranked by the total number of descendants it has, the distance (from the returned node) of every descendant & timestamp of every descendant it has.
c. by descendant I mean everything under it, not just it's direct children
for example,
here's a sample graph:
http://console.neo4j.org/r/awk6m2
First you need to know how to find the root node. The following statement finds the nodes having no outboung parent relationship - be aware that statement is potentially expensive in a large graph.
MATCH (n)
WHERE NOT ((n)-[:has_parent]->())
RETURN n
Instead you should use an index to find that node:
MATCH (n:Node {name:'abc'})
Starting with our root node, we traverse inbound parent relationship with variable depth. On each node traversed we calculate the number of children - since this might be zero a OPTIONAL MATCH is used:
MATCH (root:Node) // line 1-3 to find root node, replace by index lookup
WHERE NOT ((root)-[:has_parent]->())
WITH root
MATCH p =(root)<-[:has_parent*]-() // variable path length match
WITH last(nodes(p)) AS currentNode, length(p) AS currentDepth
OPTIONAL MATCH (currentNode)<-[:has_parent]-(c) // tranverse children
RETURN currentNode, currentNode.created, currentDepth, count(c) AS countChildren

Resources