Using CYPHER to find nodes outside a reporting chain - neo4j

I have included a picture to make this question easier to undestand.
I have a CYPHER query where I start with Manager A and I want to find out:
a) all the staff that :REPORTS_TO them in the reporting chain. Using diagram, answer: B, C, D, B1, B2, B11, D1, D2
b) who the reports :KNOWS who are not in the reporting chain. Using diagram, answer: Z, W, X
I am able to answer a) but not b) without including C, D1 and D2.
Does anyone who how to solve this problem?
I had try to run a query to find the reports and then pipe the results into a second query using the WITH clause but I have been unable to exclude C, D1 and DC
Object network

I think I found a query that works, though it only returns the nodes outside the reporting chain. It might be difficult to return both a) and b) in the same query.
First, creating the graph (should match your diagram, except I'm using lowercase):
merge (a:Person{name:"a"})
merge (b:Person{name:"b"})
merge (c:Person{name:"c"})
merge (d:Person{name:"d"})
merge (b1:Person{name:"b1"})
merge (b2:Person{name:"b2"})
merge (b11:Person{name:"b11"})
merge (d1:Person{name:"d1"})
merge (d2:Person{name:"d2"})
merge (z:Person{name:"z"})
merge (w:Person{name:"w"})
merge (x:Person{name:"x"})
merge (b)-[:REPORTS_TO]->(a)
merge (c)-[:REPORTS_TO]->(a)
merge (d)-[:REPORTS_TO]->(a)
merge (b1)-[:REPORTS_TO]->(b)
merge (b2)-[:REPORTS_TO]->(b)
merge (b11)-[:REPORTS_TO]->(b1)
merge (d1)-[:REPORTS_TO]->(d)
merge (d2)-[:REPORTS_TO]->(d)
merge (z)-[:KNOWS]->(b)
merge (z)-[:KNOWS]->(b11)
merge (b2)-[:KNOWS]->(w)
merge (b2)-[:KNOWS]->(c)
merge (b2)-[:KNOWS]->(d1)
merge (d1)-[:KNOWS]->(x)
merge (c)-[:KNOWS]->(d2)
Now for the query
MATCH (:Person{name:"a"})<-[:REPORTS_TO*]-(reporter:Person)-[:KNOWS]-(other:Person)
WITH COLLECT(reporter) AS reporters, COLLECT(other) AS others
WITH FILTER (o IN others WHERE NOT o IN reporters) AS outsiders
UNWIND outsiders AS outsider
RETURN DISTINCT outsider
This will return people Z, W, and X. There's probably a more elegant solution, but I haven't stumbled on it yet. Maybe it's hiding in plain sight?

Related

How to return nodes that have only one given relationship

I have nodes that represent documents, and nodes that represent entities. Entities can be referenced in document, if so, they are linked together with a relationship like that :
(doc)<-[:IS_REFERENCED_IN]-(entity)
The same entity can be referenced in several documents, and a document can reference several entities.
I'd like to delete, for a given document, every entity that are referenced in this given document only.
I thought of two different ways to do this.
The first one uses java to make a foreach and would basically be something like that :
List<Entity> entities = MATCH (d:Document {id:0})<-[:IS_REFERENCED_IN]-(e:Entity) return e
for (Entity entity : entities){
MATCH (e:Entity)-[r:IS_REFERENCED_IN]->(d:Document) WITH *, count(r) as nb_document_linked WHERE nb_document_linked = 1 DELETE e
}
This method would work but i'd like not to use a foreach or java code to make it. I'd like to do it in one cypher query.
The second one uses only one cypher query but doesn't work. It's something like that :
MATCH (d:Document {id:0})<-[:IS_REFERENCED_IN]-(e:Entity)-[r:IS_REFERENCED_IN]->(d:Document) WITH *, count(r) as nb_document_linked WHERE nb_document_linked = 1 DELETE e
The problem here is that nb_document_linked is not unique for every entity, it is a unique variable for all the entities, which mean it'll count every relationship of every entity, which i don't want.
So how could I make a kind of a foreach in my cypher query to make it work?
Sorry for my english, I hope the question is clear, if you need any information please ask me.
You can do something like:
MATCH (d:Document{key:1})<-[:IS_REFERENCED_IN]-(e:Entity)
WITH e
MATCH (d:Document)<-[:IS_REFERENCED_IN]-(e)
WITH COUNT (d) AS countD, e
WHERE countD=1
DETACH DELETE e
Which you can see working on this sample data:
MERGE (a:Document {key: 1})
MERGE (b:Document {key: 2})
MERGE (c:Document {key: 3})
MERGE (d:Entity{key: 4})
MERGE (e:Entity{key: 5})
MERGE (f:Entity{key: 6})
MERGE (g:Entity{key: 7})
MERGE (h:Entity{key: 8})
MERGE (i:Entity{key: 9})
MERGE (j:Entity{key: 10})
MERGE (k:Entity{key: 11})
MERGE (l:Entity{key: 12})
MERGE (m:Entity{key: 13})
MERGE (d)-[:IS_REFERENCED_IN]-(a)
MERGE (e)-[:IS_REFERENCED_IN]-(a)
MERGE (f)-[:IS_REFERENCED_IN]-(a)
MERGE (g)-[:IS_REFERENCED_IN]-(a)
MERGE (d)-[:IS_REFERENCED_IN]-(b)
MERGE (e)-[:IS_REFERENCED_IN]-(b)
MERGE (f)-[:IS_REFERENCED_IN]-(c)
MERGE (g)-[:IS_REFERENCED_IN]-(c)
MERGE (j)-[:IS_REFERENCED_IN]-(a)
MERGE (h)-[:IS_REFERENCED_IN]-(a)
MERGE (i)-[:IS_REFERENCED_IN]-(a)
MERGE (g)-[:IS_REFERENCED_IN]-(c)
MERGE (k)-[:IS_REFERENCED_IN]-(c)
MERGE (l)-[:IS_REFERENCED_IN]-(c)
MERGE (m)-[:IS_REFERENCED_IN]-(c)
On which it removes 3 Entities.
The first MATCH finds the entities that are attached to your input doc, and the second MATCH finds the number of documents that each of these entities is connected to.

Neo4j Cypher complex query optimization

Now I have a graph with millions of nodes and millions of edge relationships. There is a directed relationship between nodes.
Now suppose the node has two states A and B. I want to find all state A nodes on the path that do not have state B.
As shown in the figure below, there are nodes A--K, and then three of them, E, G and J, are of type B, and the others are of type A.
picture link is https://i.stack.imgur.com/a0yOV.jpg
For node E, its upstream and downstream traversal is shown below, so nodes B, H, K do not meet the requirements.
For node G, its upstream and downstream traversal is shown below, so nodes B, D, K do not meet the requirements.
For node J, its upstream and downstream traversal is shown below, so nodes A, B, C, D, F do not meet the requirements.
So finally only node "I" is the node that meets the requirements.
picture link is https://i.stack.imgur.com/A2eqv.jpg
The case of the above example is a DAG, but the actual situation is that there may be cycle in the graph, including spin cycle (case 1), AB cycle (case 2), large loops (case 3), and complex cycle (case 4)
picture link is https://i.stack.imgur.com/NDpED.jpg
The Cypher query statement I can write
MATCH (n:A)
WHERE NOT exists((n)-[*]->(:B))
AND NOT exists((n)<-[*]-(:B))
RETURN n;
But this query statement is stuck in the case of millions of nodes and millions of edges with a limit 35,But in the end there are more than 30,000 nodes that meet the requirements.
Obviously my statement is taking up too much memory, querying out 30+ nodes has taken up almost all the available memory, how can I write a more efficient query?
Here is a example
CREATE (a:A{id:'a'})
CREATE (b:A{id:'b'})
CREATE (c:A{id:'c'})
CREATE (d:A{id:'d'})
CREATE (e:B{id:'e'})
CREATE (f:A{id:'f'})
CREATE (g:B{id:'g'})
CREATE (h:A{id:'h'})
CREATE (i:A{id:'i'})
CREATE (j:B{id:'j'})
CREATE (k:A{id:'k'})
MERGE (a)-[:REF]->(c)
MERGE (b)-[:REF]->(c)
MERGE (b)-[:REF]->(d)
MERGE (b)-[:REF]->(e)
MERGE (c)-[:REF]->(f)
MERGE (d)-[:REF]->(g)
MERGE (e)-[:REF]->(g)
MERGE (e)-[:REF]->(h)
MERGE (f)-[:REF]->(i)
MERGE (f)-[:REF]->(j)
MERGE (f)-[:REF]->(k)
MERGE (g)-[:REF]->(k)
MERGE (g)-[:REF]->(j)
use this code will get the result 'i'
MATCH (n:A)
WHERE NOT exists((n)-[*]->(:B))
AND NOT exists((n)<-[*]-(:B))
RETURN n;
But when there are 800,000 nodes (400,000 type A, 400,000 type B) and over 1.4 million edges in the graph, this code cannot run the result
Some thoughts:
I don’t think this global graph search can be solved with a single query. You will need some kind of process to optimise exploration and use the result up to a certain point in subsequent steps.
when you could assign node labels instead of properties to reflect
the state of a node, you could use apoc.path.expandConfig to just
explore paths until you hit a node with state B.
you don’t need to re-investigate state A nodes that you traverse before you hit a node with state B, because they will not meet the requirements.
Another approach could be this, given the fact that all nodes that are on the up or downstream paths from a B node, will not fulfil the requirements. Still assuming that you use labels to distinguish A and B nodes.
MATCH (b:B)
CALL apoc.path.spanningTree(b,
{relationshipFilter: "<",
labelFilter:"/B"
}
) YIELD path
UNWIND nodes(path) AS downStreamNode
WITH b,COLLECT(DISTINCT downStreamNode) AS downStreamNodes
CALL apoc.path.spanningTree(b,
{relationshipFilter: ">",
labelFilter:"/B"}
) YIELD path
UNWIND nodes(path) AS upStreamNode
WITH b,downStreamNodes+COLLECT(DISTINCT upStreamNode) AS upAndDownStreamNodes
RETURN apoc.coll.toSet(apoc.coll.flatten(COLLECT(upAndDownStreamNodes))) AS allNodesThatDoNotFulfillRequirements

sum 2 graphs in neo4j with multiple MERGE

in brief: how can we MERGE multiple nodes and relations just like the way we do with MATCH and CREATE: we can do multiple CREATE or MATCH for nodes or relations, separated with comma, but this action is not allowed with MERGE
in detail: suppose I have two graphs:
G1: (a)-[r1]->(b)<-[r2]-(c)
G2: (a)-[r1]->(b)<-[r3]-(d)
I have G1 inserted in neo4j, and G2 ready to push to db. The normal way to do it is to merge each node pair and then merge the relation; in this example for r1 relation there would be no change in db, since G1 already has the relation, however for the second one, my CQL first create node d then add relation r3
Is there a way to push G2 to db in one step? something like:
MERGE (a), (b), (c), (a)-[r1]->(b)<-[r3]-(d)
to create such result:
(a)-[r1]->(b)<-[r2]-(c)
^
|
[r3]
|
(d)
Not with a single MERGE statement. You would need to follow the pattern of doing a MERGE for each node, then a MERGE for each relationship.
That said, Neo4j does use transactions, so while this is broken into multiple clauses in your Cypher query, the transaction is applied atomically when committed.

Neo4J/Cypher Filter nodes based on multiple relationships

Using Neo4J and Cypher:
Given the diagram below, I want to be able to start at node 'A' and get all the children that have a 'ChildOf' relationship with 'A', but not an 'InactiveChildOf' relationship. So, in this example, I would get back A, C and G. Also, a node can get a new parent ('H' in the diagram) and if I ask for the children of 'H', I should get B, D and E.
I have tried
match (p:Item{name:'A'}) -[:ChildOf*]-(c:Item) where NOT (p)-[:InactiveChildOf]-(c) return p,c
however, that also returns D and E.
Also tried:
match (p:Item{name:'A'}) -[rels*]-(c:Item) where None (r in rels where type(r) = 'InactiveChildOf') return p,c
But that returns all.
Hopefully, this is easy for Neo4J and I am just missing something obvious. Appreciate the help!
Example data: MERGE (a:Item {name:'A'}) MERGE (b:Item {name:'B'}) MERGE (c:Item {name:'C'}) MERGE (d:Item {name:'D'}) MERGE (e:Item {name:'E'}) MERGE (f:Item {name:'F'}) MERGE (g:Item {name:'G'}) MERGE (h:Item {name:'H'}) MERGE (b)-[:ChildOf]->(a) MERGE (b)- [:InactiveChildOf] ->(a) MERGE (c)-[:ChildOf]->(a) MERGE (d)-[:ChildOf]->(b) MERGE (e)-[:ChildOf]->(b) MERGE (f)-[:ChildOf]->(c) MERGE (f)- [:InactiveChildOf] ->(c) MERGE (g)-[:ChildOf]->(c) MERGE (b)-[:ChildOf]->(h)
Note, I understand that I could simply put an "isActive" property on the ChildOf relationship or remove the relationship, but I am exploring options and trying to understand if this concept would work.
If a query interpreted as: find all the nodes, the path to which passes through the nodes unrelated by InactiveChildOf to the previous node, the request might be something like this:
match path = (p:Item{name:'A'})<-[:ChildOf*]-(c:Item)
with nodes(path) as nds
unwind range(0,size(nds)-2) as i
with nds,
nds[i] as i1,
nds[i+1] as i2
where not (i1)-[:InactiveChildOf]-(i2)
with nds,
count(i1) as test
where test = size(nds)-1
return head(nds),
last(nds)
Update: I think that this version is better (check that between two nodes there is no path that will contain at least one non-active type of relationship):
match path = (p:Item {name:'A'})<-[:ChildOf|InactiveChildOf*]-(c)
with p, c,
collect( filter( r in rels(path)
where type(r) = 'InactiveChildOf'
)
) as test
where all( t in test where size(t) = 0 )
return p, c
By reading and examining the graph, correct me if I'm wrong but the actual text representation of the cypher query should be
Find me nodes in a path to A, all nodes in that path cannot have an outgoing
InactiveChildOf relationship.
So, in Cypher it would be :
MATCH p=(i:Item {name:"A"})<-[:ChildOf*]-(x)
WHERE NONE( x IN nodes(p) WHERE (x)-[:InactiveChildOf]->() )
UNWIND nodes(p) AS n
RETURN distinct n
Which returns

Carrying variables from one query to another to create chained nodes with cypher

This may be a stupid way to do this. I want to create a chain of nodes, possibly thousands of them in the following form:
(n0)-[r0]->(n1)-[r1]->(n2)...
I have programatically generated cypher which looks something like this:
MERGE (n0:Person)-[r0:RelType]->(n1:Person)
WITH n1 MERGE (n1:Person)-[r1:RelType]->(n2:Person)
WITH n2 MERGE (n2:Person)-[r2:RelType]->(n3:Person)
WITH n3 MERGE (n3:Person)-[r3:RelType]->(n4:Person)
WITH n4 MERGE (n4:Person)-[r4:RelType]->(n5:Person)
...
I then copy pasted above queries in neo4j web console and ran, but it gave following error:
Can't create node `n1` with labels or properties here. The variable is already declared in this context
I understand (or I dont?) we cannot use MERGE inside WITH . Also I know we can bulk import nodes, relationships from CSV using Neo4jImport. But I was just curious if we can generate a bunch of cyphers to copy paste them in neo4j web console and create the desired graph.
If the only thing you want is to create a long chain of nodes, you can just unwind a range :
CREATE INDEX ON :Person(id)
UNWIND range(1,100) AS i
MERGE (p:Person {id: i-1})
MERGE (p2:Person {id: i})
MERGE (p)-[:RelType]->(p2)
#Luanne is on the right track, but I think this is what you want:
CREATE (n1:Person)
WITH n1 AS n CREATE (n)-[:RelType]->(n1:Person)
WITH n1 AS n CREATE (n)-[:RelType]->(n1:Person)
WITH n1 AS n CREATE (n)-[:RelType]->(n1:Person)
WITH n1 AS n CREATE (n)-[:RelType]->(n1:Person)
(and so on...)
Except for the first line, all other lines are identical. I used CREATE because I don't think you want to use MERGE at all, since I believe you are trying to create totally new data. You can use MERGE instead if I am wrong about that.
I think you should remove the label when merging the relationship on the node n1 supplied via the WITH
MERGE (n0:Person)-[r0:RelType]->(n1:Person)
WITH n1 MERGE (n1)-[r1:RelType]->(n2:Person)
WITH n2 MERGE (n2)-[r2:RelType]->(n3:Person)
WITH n3 MERGE (n3)-[r3:RelType]->(n4:Person)
WITH n4 MERGE (n4)-[r4:RelType]->(n5:Person)
(untested)

Resources