How to move the properties and relationships of one node to another? - neo4j

I have two separate nodes in the graph, and at one point I will get a signal that those two nodes should actually be one. So I have to merge the two including their properties (there's no overlap) AND I should maintain the relationships as well.
Example graph: (d)->(a {id:1}), (b)->(c {name:"Sam"})
Desired result: (d)->(a {id:1, name:"Sam"}), (b)->(a {id:1, name:"Sam"})
The label a doesn't have to be the case really in the result - the point is we will have only one node representing the original two.
The following merges the properties fine.
MATCH (a:Entity {id:"1"}), (c:Entity {name:"Sam"})
SET a += c
But I can't seem to find a way to move/copy the relationships.
The specs:
The two nodes won't have the same properties
node a may have incoming and outgoing relationships
node c may have incoming relationships, non outgoing
Any thoughts?
Update:
The following works, assuming the type and properties coming to node c are known in advance. Can I improve this to make it more dynamic?
MATCH (a:Entity {id1:"1"}), (c:Entity {id2:"2"})
SET a += c
WITH a, c
MATCH (c)<-[r]-(b)
WITH a, c, r, b
MERGE (b)-[:REL_NAME {prop1:r.prop1, prop2:r.prop2}]->(a)
WITH c
DETACH DELETE c
Update 2:
The above throws unable to load relationship with id error sometimes and I'm not sure why, but it seems it's related to reading/writing at the same time.
Update 3:
This is a work around for the error:
MATCH (a:Entity {id1:"1"}), (c:Entity {id2:"2"})
SET a += c
WITH a, c
MATCH (c)<-[r]-(b)
WITH a, c, r, b
MERGE (b)-[r2:REL_NAME]->(a)
SET r2 += r
WITH r, c
DELETE r, c

The problem when moving relationships between nodes is that you can't set the relationship type dynamically.
So you can't MATCH all relationships from node c and recreate them on node a. This does not work:
MATCH (a:Entity {id1:"1"}), (c:Entity {id2:"2"})
WITH a, c
// get rels from node c
MATCH (c)-[r]-(b)
WITH a, c, r, b
// create the same rel from node a
MERGE (b)-["r.rel_type" {prop1:r.prop1, prop2:r.prop2}]->(a)
With neo4j 3 there are so called procedures which are plugins for the server and can be called from Cypher queries. The apoc procedure package provides procedures that do what you need: https://neo4j-contrib.github.io/neo4j-apoc-procedures/
First install the apoc plugin on your server, then use something like:
// create relationships with dynamic types
CALL apoc.create.relationship(person1,'KNOWS',{key:value,…​}, person2)
// merge nodes
CALL apoc.refactor.mergeNodes([node1,node2])

Related

Neo4j Cypher complex query optimization

Now I have a graph with millions of nodes and millions of edge relationships. There is a directed relationship between nodes.
Now suppose the node has two states A and B. I want to find all state A nodes on the path that do not have state B.
As shown in the figure below, there are nodes A--K, and then three of them, E, G and J, are of type B, and the others are of type A.
picture link is https://i.stack.imgur.com/a0yOV.jpg
For node E, its upstream and downstream traversal is shown below, so nodes B, H, K do not meet the requirements.
For node G, its upstream and downstream traversal is shown below, so nodes B, D, K do not meet the requirements.
For node J, its upstream and downstream traversal is shown below, so nodes A, B, C, D, F do not meet the requirements.
So finally only node "I" is the node that meets the requirements.
picture link is https://i.stack.imgur.com/A2eqv.jpg
The case of the above example is a DAG, but the actual situation is that there may be cycle in the graph, including spin cycle (case 1), AB cycle (case 2), large loops (case 3), and complex cycle (case 4)
picture link is https://i.stack.imgur.com/NDpED.jpg
The Cypher query statement I can write
MATCH (n:A)
WHERE NOT exists((n)-[*]->(:B))
AND NOT exists((n)<-[*]-(:B))
RETURN n;
But this query statement is stuck in the case of millions of nodes and millions of edges with a limit 35,But in the end there are more than 30,000 nodes that meet the requirements.
Obviously my statement is taking up too much memory, querying out 30+ nodes has taken up almost all the available memory, how can I write a more efficient query?
Here is a example
CREATE (a:A{id:'a'})
CREATE (b:A{id:'b'})
CREATE (c:A{id:'c'})
CREATE (d:A{id:'d'})
CREATE (e:B{id:'e'})
CREATE (f:A{id:'f'})
CREATE (g:B{id:'g'})
CREATE (h:A{id:'h'})
CREATE (i:A{id:'i'})
CREATE (j:B{id:'j'})
CREATE (k:A{id:'k'})
MERGE (a)-[:REF]->(c)
MERGE (b)-[:REF]->(c)
MERGE (b)-[:REF]->(d)
MERGE (b)-[:REF]->(e)
MERGE (c)-[:REF]->(f)
MERGE (d)-[:REF]->(g)
MERGE (e)-[:REF]->(g)
MERGE (e)-[:REF]->(h)
MERGE (f)-[:REF]->(i)
MERGE (f)-[:REF]->(j)
MERGE (f)-[:REF]->(k)
MERGE (g)-[:REF]->(k)
MERGE (g)-[:REF]->(j)
use this code will get the result 'i'
MATCH (n:A)
WHERE NOT exists((n)-[*]->(:B))
AND NOT exists((n)<-[*]-(:B))
RETURN n;
But when there are 800,000 nodes (400,000 type A, 400,000 type B) and over 1.4 million edges in the graph, this code cannot run the result
Some thoughts:
I don’t think this global graph search can be solved with a single query. You will need some kind of process to optimise exploration and use the result up to a certain point in subsequent steps.
when you could assign node labels instead of properties to reflect
the state of a node, you could use apoc.path.expandConfig to just
explore paths until you hit a node with state B.
you don’t need to re-investigate state A nodes that you traverse before you hit a node with state B, because they will not meet the requirements.
Another approach could be this, given the fact that all nodes that are on the up or downstream paths from a B node, will not fulfil the requirements. Still assuming that you use labels to distinguish A and B nodes.
MATCH (b:B)
CALL apoc.path.spanningTree(b,
{relationshipFilter: "<",
labelFilter:"/B"
}
) YIELD path
UNWIND nodes(path) AS downStreamNode
WITH b,COLLECT(DISTINCT downStreamNode) AS downStreamNodes
CALL apoc.path.spanningTree(b,
{relationshipFilter: ">",
labelFilter:"/B"}
) YIELD path
UNWIND nodes(path) AS upStreamNode
WITH b,downStreamNodes+COLLECT(DISTINCT upStreamNode) AS upAndDownStreamNodes
RETURN apoc.coll.toSet(apoc.coll.flatten(COLLECT(upAndDownStreamNodes))) AS allNodesThatDoNotFulfillRequirements

Deleting duplicate relationships in neo4j - is this correct?

I have developed a query which, by trial and error, appears to find all of the duplicated relationships in a Neo4j DB. I want delete all but one of these relationships but I'm concerned that I have not thought of problematic cases that could result in data deletion.
So, does this query delete all but one of a duplicated relationship?
MATCH (a)-->(b)<--(a) # identify where the duplication is present
WITH DISTINCT a, b
MATCH (a)-[r]->(b) # get all duplicated paths themselves
WITH a, b, collect(r)[1..] as rs # remove the first instance from the list
UNWIND rs as r
DELETE r
If I replace the UNWIND rs as r; DELETE r with WITH a, b, count(rs) as cnt RETURN cnt it seems to return the unnecessary relationships.
I'm still relucant to put this somewhere to be used by others, though....
Thanks
First of all, let me (strictly) define the term: "duplicate relationships". Two relationships are duplicates if they:
Connect the same pair of nodes (call them a and b)
Have the same relationship type
Have exactly the same set of properties (both names and values)
Have the same directionality between a and b (iff directionality is significant for use case)
Your query only considers #1 and #4, so it generally could delete non-duplicate relationships as well.
Here is a query that will take all of the above into consideration (assuming #4 should be included):
MATCH (a)-[r1]->(b)<-[r2]-(a)
WHERE TYPE(r1) = TYPE(r2) AND PROPERTIES(r1) = PROPERTIES(r2)
WITH a, b, apoc.coll.union(COLLECT(r1), COLLECT(r2))[1..] AS rs
UNWIND rs as r
DELETE r
Aggregating functions (like COLLECT) use non-aggregated terms as grouping keys, so there is no need for the query to perform a separate redundant DISTINCT a,b test.
The APOC function apoc.coll.union returns the distinct union of its 2 input lists.

Cypher (Neo4j) create relationship to all other nodes (except itself)

I have created 10 nodes in Neo4j.
How do I quickly and easily create relationships between all of them? (from all, to all, excluding itself and without duplicated relationships?)
For example, if I were to have 3 nodes called A, B, and C:
A - B
A - C
B - C
This should work:
MATCH (n), (m)
WHERE ID(n) < ID(m)
CREATE (n)-[:FOO]->(m)
The WHERE test ensures that n and m are different, and also that the same pair is not processed a second time (in reverse order).

Replacing relations from one node to another by one single relation

I have been pushing several times the same relationship between 2 nodes in Neo4j.
It was a mistake as it makes the visualization less clear.
Now, I would like to replace those several relations between 2 nodes by one single relation. It would be great if we could keep the number of relations inside a property "count" on the new unique relation.
What would be an efficient way to solve this problem ?
I have about 100 000 of relations and I am a bit worried about the time it would take.
Here is a quick example to make the problem clearer :
I have :
Node A -- R1 -- Node B
Node A -- R2 -- Node B
And I would like to have
Node A -- R {count : 2} -- Node B
Thanks!
I assume these relationships don't have any properties and Direction of the relationships doesn't matter.
You can combine these relationships with Cypher Query as shown:
MATCH (p:Node)-[r]-(c:Node)
WHERE ID(p) > ID(c)
DELETE r
WITH p, c, COUNT(r) as count
CREATE (p)-[:R{count:count}]->(c)
If you want to merge relationships having the same directions only then you can use the following query:
MATCH (p:Node)-[r]->(c:Node)
DELETE r
WITH p, c, COUNT(r) as count
CREATE (p)-[newrel:R{count:count}]->(c)
If you want to merge the properties as well then you can take help of
apoc plugin's apoc.refactor.mergeRelationships method.

Filtering out nodes on two cypher paths

I have a simplified Neo4j graph (old version 2.x) as the image with 'defines' and 'same' edges. Assume the number on the define edge is a property on the edge
The queries I would like to run are:
1) Find nodes defined by both A and B -- Requried result: C, C, D
START A=node(885), B=node(996) MATCH (A-[:define]->(x)<-[:define]-B) RETURN DISTINCT x
Above works and returns C and D. But I want C twice since its defined twice. But without the distinct on x, it returns all the paths from A to B.
2)Find nodes that are NOT (defined by both A,B OR are defined by both A,B but connected via a same edge) -- Required result: G
Something like:
R1: MATCH (A-[:define]->(x)<-[:define]-B) RETURN DISTINCT x
R2: MATCH (A-[:define]->(e)-(:similar)-(f)<-[:define]-B) RETURN e,f
(Nodes defined by A - (R1+R2) )
3) Find 'middle' nodes that do not have matching calls from both A and B --Required result: C,G
I want to output C due to the 1 define(either 45/46) that does not have a matching define from B.
Also output G because there's no define to G from B.
Appreciate any help on this!
Your syntax is a bit strange to me, so I'm going to assume you're using an older version of Neo4j. We should be able to use the same approaches, though.
For #1, Your proposed match without distinct really should be working. The only thing I can see is adding missing parenthesis around A and B node variables.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)<-[:define]-(B)
RETURN x
Also, I'm not sure what you mean by "returns all paths from A to B." Can you clarify that, and provide an example of the output?
As for #2, we'll need several several parts to this query, separating them with WITH accordingly.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)<-[:define]-(B)
WITH A, B, COLLECT(DISTINCT x) as exceptions
OPTIONAL MATCH (A)-[:define]->(x)-[:same]-(y)<-[:define]-(B)
WHERE x NOT IN exceptions AND y NOT IN exceptions
WITH A, B, exceptions + COLLECT(DISTINCT x) + COLLECT(DISTINCT y) as allExceptions
MATCH (aNode)
WHERE aNode NOT IN allExceptions AND aNode <> A AND aNode <> B
RETURN aNode
Also, you should really be using labels on your nodes. The final match will match all nodes in your graph and will have to filter down otherwise.
EDIT
Regarding your #3 requirement, the SIZE() function will be very helpful here, as you can get the size of a pattern match, and it will tell you the number of occurrences of that pattern.
The approach on this query is to first get the collection of nodes defined by A or B, then filter down to the nodes where the number of :defines relationships from A are not equal to the number of :defines relationships from B.
While we would like to use something like a UNION WITH in order to get the union of nodes defined by A and union it with the nodes defined by B, Neo4j's UNION support is weak right now, as it doesn't let you do any additional operations after the UNION happens, so instead we have to resort to adding both sets of nodes into the same collection then unwinding them back into rows.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)
WITH A, B, COLLECT(x) as middleNodes
MATCH (B)-[:define]->(x)
WITH A, B, middleNodes + COLLECT(x) as allMiddles
UNWIND allMiddles as middle
WITH DISTINCT A, B, middle
WHERE SIZE((A)-[:define]->(middle)) <> SIZE((B)-[:define]->(middle))
RETURN middle

Resources