merging nodes into a new one with cypher and neo4j - neo4j

using Neo4j - Graph Database Kernel 2.0.0-M02 and the new merge function,
I was trying to merge nodes into a new one (merge does not really merges but binds to the returning identifier according to the documentation) and delete old nodes. I only care at the moment about properties to be transferred to the new node and not relationships.
What I have at the moment is the cypher below
merge (n:User {form_id:123}) //I get the nodes with form_id=123 and label User
with n match p=n //subject to change to have the in a collection
create (x) //create a new node
foreach(n in nodes(p): set x=n) //properties of n copied over to x
return n,x
Problems
1. When foreach runs it creates a new x for every n
2. Moving properties from n to x is replacing all properties each time with the new n
so if the 1st n node from merge has 2 properties a,b and the second c,d in the and after the set x=n all new nodes end up with c,d properties. I know is stated in the documentation so my question is:
Is there a way to merge all properties of N number of nodes (and maybe relationships as well) in a new node with cypher only?

I don't think the Cypher language currently has a syntax that non-destructively copies any and all properties from one node into another.
However, I'll present the solution to a simple situation that may be similar to yours. Let's say that some User nodes have the properties a & b, and some others have c & d. For example:
CREATE (:User { id:1,a: 1,b: 2 }),(:User { id:1,c: 3,d: 4 }),
(:User { id:2,a:10,b:20 }),(:User { id:2,c:30,d:40 });
This is how we would "merge" all User nodes with the same id into a single node:
MATCH (x:User), (y:User)
WHERE x.id=y.id AND has(x.a) AND has(y.c)
SET x.c = y.c, x.d = y.d
DELETE y
RETURN x
You can try this out in the neo4j sandbox at: http://console.neo4j.org/

With Neo4j-3.x it is also possible to merge two nodes into one using a specific apoc procedure.
First you need to download the apoc procedures jar file in your into your $NEO4J_HOME/plugins folder and start the Neo4j server.
Then you can call apoc.refactor.mergeNodes this way:
MATCH (x:User), (y:User)
WHERE x.id=y.id
call apoc.refactor.mergeNodes([x,y]) YIELD node
RETURN node
As I can see it, the resulting node would have all the properties of both x and y, choosing the values of y if both are set.

Related

MERGE Nodes by a Property field

I have different nodes that share one same property field, i need to merge these nodes into one and in the same time copy all the rest of the other properties in the merge node.
example:
(n1,g,p1) (n2,g,p2) (n3,g,p3) =>(n,g,p1,p2,p3)
Important to Note that i don't need the apoc solutions since user defined functions dosen't work in CAPS that i m working at
update :
geohash is the field that have a repeated values, so i want to merge the nodes by this field .
The CAPS team gave me this cypher query to have distinct geohash nodes from the intial graph:
CATALOG CREATE GRAPH temp {
FROM GRAPH session.inputGraph
MATCH (n)
WITH DISTINCT n.geohash AS geohash
CONSTRUCT
CREATE (:HashNode {geohash: geohash})
RETURN GRAPH
}
, however it still missing is the collect of the rest of the properties on the merged nodes.
I haven't a problem for the relationship ,cause we can copy them later from the intial graph:
FROM GRAPH inputGraph
MATCH (from)-[via]->(to)
FROM GRAPH temp
MATCH (n), (m)
WHERE from.geohash = n. AND AND to.geohash = m.geohash
CONSTRUCT
CREATE (n)-[COPY OF via]->(m)
RETURN GRAPH
It's not 100% possible in pure cypher, that's why there is an APOC procedure for that.
To merge two nodes , you have to :
create the merge node with all the properties
to create all the relationship of the nodes on the merge one
For the first part it's possible in cypher. Example :
MATCH (n) WHERE id(n) IN [106, 68]
WITH collect(n) AS nodes
CREATE (new:MyNode)
with nodes, new
UNWIND nodes as node
SET new += properties(node)
RETURN new
But for the second part, you need to be able to create relationship with a dynamic type and dynamic direction, and this is not allowed in cypher ...

Cypher - Delete node with given property and reconnect graph

I have a graph consisting of paths.
I need to delete all nodes, that have a property:
linksTo:'javascript'
After deleting i have to reconnect the paths. This means i need to create a new relationship for each gap. This relationship has a property named deltaTime that holds some integer value. This value (deltaTime) should be the sum of all deltaTime-properties of the deleted relationships of this path. Please look at the following picture for better understanding.
I don't know how to detect multiple "bad" nodes in a row, with a variable row-length.
It would help if you could provide the labels involved in your graph, as well as the relationship types you're using.
Assuming all these paths are chains (only single relationships connecting each node), something like this should work for you:
// first, add a label on all the nodes we plan on deleting
// it helps if they already have the same label especially if linksTo property is indexed.
MATCH (n{linksTo:'javascript'})
SET n:ToDelete
With the appropriate nodes labeled, we'll find the segments of nodes to delete, the surrounding nodes that need to be connected, and create the new relationship. We ensure a and b are start and end nodes of the chain by ensuring there are no :ToDelete nodes linking to a, or from b.
MATCH (a:ToDelete)
WHERE NOT (:ToDelete)-->(a)
AND ()-->(a)
MATCH p = (a)-[rel*0..]->(b:ToDelete)
WHERE ALL(node in nodes(p) WHERE node:ToDelete)
AND NOT (b)-->(:ToDelete)
AND (b)-->()
WITH a, b, REDUCE(s = 0, r IN rel | s + r.value) as sum
// now get the adjacent nodes we need to connect
MATCH (x)-[r1]->(a), (b)-[r2]->(y)
WITH x, y, sum + r1.value + r2.value AS sumValue
// making up relationship type and property name as I don't know what you're using
MERGE (x)-[:Rel{value: sumValue}]->(y)
Finally, when you're sure the new relationships look correct, delete the nodes you no longer need.
MATCH (n:ToDelete)
DETACH DELETE n

How to move the properties and relationships of one node to another?

I have two separate nodes in the graph, and at one point I will get a signal that those two nodes should actually be one. So I have to merge the two including their properties (there's no overlap) AND I should maintain the relationships as well.
Example graph: (d)->(a {id:1}), (b)->(c {name:"Sam"})
Desired result: (d)->(a {id:1, name:"Sam"}), (b)->(a {id:1, name:"Sam"})
The label a doesn't have to be the case really in the result - the point is we will have only one node representing the original two.
The following merges the properties fine.
MATCH (a:Entity {id:"1"}), (c:Entity {name:"Sam"})
SET a += c
But I can't seem to find a way to move/copy the relationships.
The specs:
The two nodes won't have the same properties
node a may have incoming and outgoing relationships
node c may have incoming relationships, non outgoing
Any thoughts?
Update:
The following works, assuming the type and properties coming to node c are known in advance. Can I improve this to make it more dynamic?
MATCH (a:Entity {id1:"1"}), (c:Entity {id2:"2"})
SET a += c
WITH a, c
MATCH (c)<-[r]-(b)
WITH a, c, r, b
MERGE (b)-[:REL_NAME {prop1:r.prop1, prop2:r.prop2}]->(a)
WITH c
DETACH DELETE c
Update 2:
The above throws unable to load relationship with id error sometimes and I'm not sure why, but it seems it's related to reading/writing at the same time.
Update 3:
This is a work around for the error:
MATCH (a:Entity {id1:"1"}), (c:Entity {id2:"2"})
SET a += c
WITH a, c
MATCH (c)<-[r]-(b)
WITH a, c, r, b
MERGE (b)-[r2:REL_NAME]->(a)
SET r2 += r
WITH r, c
DELETE r, c
The problem when moving relationships between nodes is that you can't set the relationship type dynamically.
So you can't MATCH all relationships from node c and recreate them on node a. This does not work:
MATCH (a:Entity {id1:"1"}), (c:Entity {id2:"2"})
WITH a, c
// get rels from node c
MATCH (c)-[r]-(b)
WITH a, c, r, b
// create the same rel from node a
MERGE (b)-["r.rel_type" {prop1:r.prop1, prop2:r.prop2}]->(a)
With neo4j 3 there are so called procedures which are plugins for the server and can be called from Cypher queries. The apoc procedure package provides procedures that do what you need: https://neo4j-contrib.github.io/neo4j-apoc-procedures/
First install the apoc plugin on your server, then use something like:
// create relationships with dynamic types
CALL apoc.create.relationship(person1,'KNOWS',{key:value,…​}, person2)
// merge nodes
CALL apoc.refactor.mergeNodes([node1,node2])

Find all relations starting with a given node

In a graph where the following nodes
A,B,C,D
have a relationship with each nodes successor
(A->B)
and
(B->C)
etc.
How do i make a query that starts with A and gives me all nodes (and relationships) from that and outwards.
I do not know the end node (C).
All i know is to start from A, and traverse the whole connected graph (with conditions on relationship and node type)
I think, you need to use this pattern:
(n)-[*]->(m) - variable length path of any number of relationships from n to m. (see Refcard)
A sample query would be:
MATCH path = (a:A)-[*]->()
RETURN path
Have also a look at the path functions in the refcard to expand your cypher query (I don't know what exact conditions you'll need to apply).
To get all the nodes / relationships starting at a node:
MATCH (a:A {id: "id"})-[r*]-(b)
RETURN a, r, b
This will return all the graphs originating with node A / Label A where id = "id".
One caveat - if this graph is large the query will take a long time to run.

finding the farthest node using Neo4j (node without any incoming relation)

I have created a graph db in Neo4j and want to use it for generalization purposes.
There are about 500,000 nodes (20 distinct labels) and 2.5 million relations (50 distinct types) between them.
In an example path : a -> b -> c-> d -> e
I want to find out the node without any incoming relations (which is 'a').
And I should do this for all the nodes (finding the nodes at the beginning of all possible paths that have no incoming relations).
I have tried several Cypher codes without any success:
match (a:type_A)-[r:is_a]->(b:type_A)
with a,count (r) as count
where count = 0
set a.isFirst = 'true'
or
match (a:type_A), (b:type_A)
where not (a)<-[:is_a*..]-(b)
set a.isFirst = 'true'
Where is the problem?!
Also, I have to create this code in neo4jClient, too.
Your first query will only match paths where there is a relationship [r:is_a], so counting r can never be 0. Your second query will return any arbitrary pair of nodes labeled :typeA that aren't transitively related by [:is_a]. What you want is to filter on a path predicate. For the general case try
MATCH (a)
WHERE NOT ()-->a
This translates roughly "any node that does not have incoming relationships". You can specify the pattern with types, properties or labels as needed, for instance
MATCH (a:type_A)
WHERE NOT ()-[:is_a]->a
If you want to find all nodes that have no incoming relationships, you can find them using OPTIONAL MATCH:
START n=node(*)
OPTIONAL MATCH n<-[r]-()
WITH n,r
WHERE r IS NULL
RETURN n

Resources