Neo4j Cypher: Merge duplicate nodes

Neo4j Cypher: Merge duplicate nodes - neo4j

I have some duplicate nodes, all with the label Tag. What I mean with duplicates is that I have two nodes with the same name property, example:
{ name: writing, _id: 57ec2289a90f9a2deece7e6d},
{ name: writing, _id: 57db1da737f2564f1d5fc5a1},
{ name: writing }
The _id field is no longer used so in all effects these three nodes are the same, only that each of them have different relationships.
What I would like to do is:
Find all duplicate nodes (check)
MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
RETURN name, nodelist, count
Copy all relationships from the duplicate nodes into the first one
Delete all the duplicate nodes
Can this be achieved with cypher query? Or do I have to make a script in some programming language? (this is what I'm trying to avoid)

APOC Procedures has some graph refactoring procedures that can help. I think apoc.refactor.mergeNodes() ought to do the trick.
Be aware that in addition to transferring all relationships from the other nodes onto the first node of the list, it will also apply any labels and properties from the other nodes onto the first node. If that's not something you want to do, then you may have to collect incoming and outgoing relationships from the other nodes and use apoc.refactor.to() and apoc.refactor.from() instead.
Here's the query for merging nodes:
MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
CALL apoc.refactor.mergeNodes(nodelist) YIELD node
RETURN node

The above cypher query didn't work on my Database version 3.4.16
What worked for me was:
MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
CALL apoc.refactor.mergeNodes(nodelist,{
properties:"combine",
mergeRels:true
})
YIELD node
RETURN node;

Related

To get nodes and relationships between two specified nodes for review

I have a database containing millions of nodes and edge data and I want to get all the nodes and relationships data between two specified nodes.
Below is the sample data for the graph which has 7 nodes and 7 relationships.
To traverse from 1st node to 7th node I can use the variable length relationship approach and can get the nodes and relationships in between the first and 7th nodes (but in this approach we need to know the number of relationships and nodes between 1st and 7th node).
For using variable length relationship approach we have to specify the number where we will get the end node and it traverses in one direction.
But in my case I know the start and end node and don't know how many relationships and nodes are in between them. Please suggest how I can write a Cypher query for this case.
I have used the APOC spanning tree procedure where it returns ‘path’ from the 1st and 7th element, but it does not return the nodes and relationships. Can I get nodes and relationships data in return using the spanning tree procedure and how?
Is there any other way to get all nodes and relations between two nodes without using the APOC procedure?
Here is query with apoc procedure:
MATCH (start:temp {Name:"Joel"}), (end:temp {Name:"Jack"}) CALL apoc.path.spanningTree(start,{terminatorNodes:[end]}) YIELD path return path
Note: In our graph database nodes can have multi direction relations.
[Sample nodes and relationships snapshot]
: https://i.stack.imgur.com/nN9hk.png

I assume you do not want to have duplicates in your result, so my approach would be this
MATCH (start:temp {Name:"Joel"}), (end:temp {Name:"Jack"})
MATCH p=shortestPath((start)-[*]->(end))
UNWIND nodes(p) AS node
UNWIND relationships(p) AS rel
RETURN COLLECT(DISTINCT node) as nodes, COLLECT(DISTINCT rel) as rels

Might be better to use shortestPath operator to find the shortest path between two nodes.
MATCH (start:temp {Name:"Joel"}), (end:temp {Name:"Jack"})
MATCH p=shortestPath((start)-[*]->(end))
RETURN nodes(p) as nodes, relationships(p) as rels

Find a node with most connections to other unique nodes

Given:
Two node labels:
1000 (:A) nodes
1000 (:B) nodes
Constraints:
CREATE CONSTRAINT ON (a:A) ASSERT a.id IS UNIQUE;
CREATE CONSTRAINT ON (b:B) ASSERT b.id IS UNIQUE;
One unidirectional relationship type:
4000 [:RELATED_TO] relationships
Multiple (a:A)-[:RELATED_TO]->(b:B) paths
(Meaning, the same node (a:A) could be related to the same node (b:B) multiple times)
I'm trying to run a query that would show the paths of the node that is connected to the biggest number of other unique nodes in the graph. For example, if nodes (a1:A), (a2:A), (a3:A), and (a4:A) are all connected to (b:B) at least once, and it so happens that no other (:B) is connected to any more than three unique (:A) nodes elsewhere in the graph, I would like for the Neo4j Browser to show (b:B) in the center and (a1:A) through (a4:A) around it. I feel like my biggest challenge is that I haven't been able to figure out how to avoid counting up multiple (a1:A)-[:RELATED_TO]->(b:B) paths.
I'll be happy to provide more context if necessary. Thanks in advance!

This query uses the aggregating function COLLECT (with the DISTINCT operator to qualify its argument) to return the B node that has relationships with the most distinct A nodes, along with those A nodes:
MATCH (a:A)-[:RELATED_TO]->(b:B)
RETURN b, COLLECT(DISTINCT a) AS aNodes
ORDER BY SIZE(aNodes) DESC
LIMIT 1;

Duplicate relationship in Neo4j

Why does this create two relationships instead of one?
MATCH (a:Person{name:'Barack'}), (b:Person{name:'Raback'})
CREATE (a)-[r:SHAKES_HANDS_WITH{id:toString(rand())}]->(b)
RETURN r
(Random number "id" is just added for demo purposes.)

You probably have 2 Person nodes with the same name (either 'Barack' or 'Raback').
Assuming that the other name has only a single node, the MATCH clause will produce 2 rows -- which will cause the the CREATE clause to be executed twice.
To verify if this is your scenario, this query will show you how many nodes have each name:
MATCH (a:Person)
WHERE a.name IN ['Barack', 'Raback']
RETURN a.name, COUNT(a) as nodeCount

Neo4j cypher query for deleting all children nodes and its relationships except one child node

I am trying to delete child nodes except one child node.
when I execute this Cypher:
MATCH (n{name:'Java'})-[r]-(c)
return c.name
I am getting possible node names, but I need only longest node name and I have to delete rest of nodes and its relationships.

This query should work:
MATCH (n{name:'Java'})--(c)
WHERE EXISTS(c.name)
WITH c ORDER BY LENGTH(c.name) DESC
SKIP 1
DETACH DELETE c;
It finds all c nodes that have a name property, orders them in descending order by the length of the name value, skips the c node with the longest name, and uses DETACH DELETE to delete the other c nodes and all their relationships.

Limiting ShortestPath in Cypher to nodes with specific properties

I am trying to figure out how to limit a shortest path query in cypher so that it only connects "Person" nodes containing a specific property.
Here is my query:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {id: 2})) RETURN p
I would like to limit it so that when it connects from one Person node to another Person node, the Person node has to contain a property called "job" and a value of "engineer."
Could you help me construct the query? Thanks!

Your requirements are not very clear, but if you simply want one of the people to have an id of 1 and the other person to be an engineer, you would use this:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {job: "engineer"}))
RETURN p;
This kind query should be much faster if you also created indexes for the id and job properties of Person.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Neo4j Cypher: Merge duplicate nodes - neo4j

The above cypher query didn't work on my Database version 3.4.16 What worked for me was: MATCH (n:Tag) WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count WHERE count > 1 CALL apoc.refactor.mergeNodes(nodelist,{ properties:"combine", mergeRels:true }) YIELD node RETURN node;

Related

To get nodes and relationships between two specified nodes for review

Find a node with most connections to other unique nodes

Duplicate relationship in Neo4j

Neo4j cypher query for deleting all children nodes and its relationships except one child node

Limiting ShortestPath in Cypher to nodes with specific properties

Categories

Resources