Merging duplicate nodes and their relationship - neo4j

I have a requirements to merge the duplicate nodes and keep one copy. Issue I am facing is, when I merge nodes, there will be duplicate relationship created. Instead, I want to merge the relationship as well without duplicates.
Can you give some suggestions?
CREATE (n:People { name: 'Person1', lastname: 'Person1LastName', email_ID:'Person1#test2.com' })
CREATE (n:People { name: 'Person2', lastname: 'Person2LastName', email_ID:'Person2#test2.com' })
CREATE (n:People { name: 'Person2', lastname: 'Person2LastName', staysin:'California' })
CREATE (n:People { name: 'Person3', lastname: 'Person3LastName', email_ID:'Person3#test2.com' })
Person2 -[r:Has_Met]->(Person1)
(Person3)-[r:FRIENDS_WITH]->(Person2) having email_ID='Person2#test2.com'
Now i wants to keep Person2 nodes and keep both the relationship with other nodes -
something like this:
MATCH (p:People{name:"person1"})
WITH p.name as name, collect(p) as nodes, count() as cnt
WHERE cnt > 1
WITH head(nodes) as first, tail(nodes) as rest
UNWIND rest AS to_delete
MATCH (to_delete)-[r:HAS_MET]->(e:name)
MERGE (first)-[r1:HAS_MET]->(e)
on create SET r1=r
SET to_delete.isDuplicate=true
RETURN count();
This is a related question, but here I know only one relationship (HAS_MET) will be considered. How do I consider all the relationships once?

Without presentation of your model or listing of sample data, unfortunately, I am only able to answer in general, which may help you nevertheless.
Have a look at the APOC library and consider the use of the procedures Merge Nodes and Redirect Relationship To. You will find explanatory images and Cypher statements there for each case.
Extension after question update
Initial situation
CREATE
(p1:People {name: 'Person1', lastname: 'Person1LastName', email_ID: 'Person1#test2.com'}),
(p2a:People {name: 'Person2', lastname: 'Person2LastName', email_ID: 'Person2#test2.com'}),
(p2b:People {name: 'Person2', lastname: 'Person2LastName', staysin: 'California'}),
(p3:People {name: 'Person3', lastname: 'Person3LastName', email_ID: 'Person3#test2.com'}),
(p2a)-[:HAS_MET]->(p1),
(p2b)-[:HAS_MET]->(p1),
(p3)-[:FRIENDS_WITH]->(p2a);
Solution
MATCH (oneNode:People {email_ID: 'Person2#test2.com'}), (otherNode:People {staysin: 'California'})
CALL apoc.refactor.mergeNodes([oneNode, otherNode])
YIELD node
MATCH (node)-[relation:HAS_MET]->(:People)
WITH tail(collect(relation)) AS surplusRelations
UNWIND surplusRelations AS surplusRelation
DELETE surplusRelation;
line 1: select both to be combined nodes
line 2: call appropriate merge nodes procedure
line 3: define result variable
line 4: identify all relationships between the combined node and a met person (there are two at least)
line 5: select all relationships but the first one
line 7: delete all surplus relationships
Result
merged node Person2, containing all attributes from source nodes (note especially email_ID and staysin)
one relationship Person1-Person2

Related

Iterate over list of objects and create nodes by properties of objects Neo4j

I have a list of objects, the objects have different properties.
array = [{name: 'Armen', age: 26}, {name: 'Alex', profession: 'Scientist'}]
I need to iterate over list and create nodes with properties provided by objects. Which is the easiest and best practice way to do it? Thanks in advance! I have tried to use unwind both for list and object keys
WITH array AS nodes
UNWIND nodes AS node
UNWIND keys(node) AS prop
WITH node, prop
MERGE (man: Man {prop:node[prop]})
RETURN man
but in this case I get one node for each property.
Unwinding prop will result in separate rows for each property and that is why it doesn't work. You can keep the properties in one collection like this,
WITH [{name: 'Armen', age: 26}, {name: 'Alex', profession: 'Scientist'}] AS nodes
UNWIND nodes AS node
WITH node, properties(node) as props
MERGE (man:Man {name: props.name}) ON CREATE SET man += props
RETURN man
(i'm assuming that name is common to all items)
As #aldrin and Cobra from Neo4j Community showed, I can add properties with SET. Here is the easiest way I found to do it.
WITH [{name: 'Armen', age: 26}, {name: 'Alex', profession: 'Scientist'}]
AS nodes
UNWIND nodes AS node
CREATE (man:Man)
SET man += node
RETURN man
I added Ids to my nodes so I can create and update nodes by just one query.
WITH [{id: 0, name: 'Armen', age: 26}, {id: 1, name: 'Alex', profession: 'Scientist'}] AS nodes
UNWIND nodes AS node
MERGE (man:Man {id: node.id})
SET man += node
RETURN man

Conditional partial merge of pattern into graph

I'm trying to create a relationship that connects a person to a city -> state -> country without recreating the city/state/country nodes and relationships if they do already exist - so I'd end-up with only one USA node in my graph for example
I start with a person
CREATE (p:Person {name:'Omar', Id: 'a'})
RETURN p
then I'd like to turn this into an apoc.do.case statement with apoc
or turn it into one merge statement using unique the constraint that creates a new node if no node is found or otherwise matches an existing node
// first case where the city/state/country all exist
MATCH (locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality)-[:SITUATED_IN]->(adminArea)-[:SITUATED_IN]->(country)
return p
// second case where only state/country exist
MATCH (adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea)-[:SITUATED_IN]->(country)
return p
// third case where only country exists
MATCH (country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country)
return p
// last case where none of city/state/country exist, so I have to create all nodes + relations
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
return p
The key here is I only want to end-up with one (California)->(USA). I don't want those nodes & relationships to get duplicated
Your queries that use MATCH never specify which Person you want. Variable names like p only exist for the life of a query (and sometimes not even that long). So p is unbound in your MATCH queries, and can result in your MERGE clauses creating empty nodes. You need to add MATCH (p:Person {Id: 'a'}) to the start of those queries (assuming all people have unique Id values).
It should NOT be the responsibility of every single query to ensure that all needed localities exist and are connected correctly -- that is way too much complexity and overhead for every query. Instead, you should create the appropriate localities and inter-locality relationships separately -- before you need them. If fact, it should be the responsibility of each query that creates a locality to create all the relationships associated with it.
A MERGE will only not create the specified pattern if every single thing in the pattern already exists, so to avoid duplicates a MERGE pattern should have at most 1 thing that might not already exist. So, a MERGE pattern should have at most 1 relationship, and if it has a relationship then the 2 end nodes should already be bound (by MATCH clauses, for example).
Once the Locality nodes and the inter-locality relationships exist, you can add a person like this:
MATCH (locality:Locality {name: "San Diego"})
MERGE (p:Person {Id: 'a'}) // create person if needed, specifying a unique identifier
ON CREATE SET p.name = 'Omar'; // set other properties as needed
MERGE (p)-[:SITUATED_IN]->(locality) // create relationship if necessary
The above considerations should help you design the code for creating the Locality nodes and the inter-locality relationships.
Finally, the solution I used is much simpler, it's a series of merges.
match (person:Person {Id: 'Omar'}) // that should be present in the graph
merge (country:Country {name: 'USA'})
merge (state:State {name: 'California'})-[:SITUATED_IN]->(country)
merge (city:City {name: 'Los Angeles'})-[:SITUATED_IN]->(state)
merge (person)-[:SITUATED_IN]->(city)
return person;

Neo4j - LOAD-CSV not creating all nodes

I am just getting started on Neo4J, and I am trying to load some data into Neo4j 3.1 using LOAD CSV with the following script:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///Fake59.csv" AS line
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname, title: line.Title,
gender: line.Gender, birthday: line.Birthday, bloodType: line.BloodType, weight: line.Pounds, height: line.FeetInches})
MERGE (contact:Contact {phoneNumber: line.TelephoneNumber, email: line.EmailAddress})
MERGE (person)-[:CONTACTED_AT]->(contact)
MERGE (color:Color {name: line.Color})
MERGE (person)-[:FAVORITE_COLOR]->(Color)
MERGE (address:Address {streetAddress: line.StreetAddress, city: line.City, zipCode: line.ZipCode})
MERGE (person)-[:LIVES_AT]->(address)
MERGE (state:State {abbr: line.State, name: line.StateFull})
MERGE (city)-[:STATE_OF]->(stage)
MERGE (country:Country {name: line.CountryFull, abbr: line.Country, code: line.TelephoneCountryCode})
MERGE (state)-[:IN_COUNTRY]->(country)
MERGE (credentials:Credentials {userName: line.Username, password: line.Password, GUID: line.GUID})
MERGE (person)-[:LOGS_in]->(credentials)
MERGE (browser:Browser {agent: line.BrowserUserAgent})
MERGE (person)-[:BROWSES_WITH]->(browser)
MERGE (creditCard:CreditCard {number: line.CCNumber, cvv2: line.CVV2, expireDate: line.CCExpires})
MERGE (person)-[:USES_CC]->(creditCard)
MERGE (creditCompany:CreditCompany {name: line.CCType})
MERGE (creditCard)-[:MANAGED_BY]->(creditCompany)
MERGE (occupation:Occupation {name: line.Occupation})
MERGE (person)-[:WORKS_AS]->(occupation)
MERGE (company:Company {name: line.Company})
MERGE (person)-[:WORKDS_FOR]->(company)
MERGE (company)-[:EMPLOYES]->(occupation)
MERGE (vehicle:Vehicle {name: line.Vehicle})
MERGE (person)-[:DRIVES]->(vehicle)
The input file has about 50k rows. It runs for a few hours the process does not finish, but after that time if I query the database I see that only the node type (Person) got created. If I run a smaller file with 3 entries only all the additional nodes and relationships are created.
I have already changed the amount of memory allocated to Neo4j and to the JVM, and still no success. I understand that MERGE takes longer than CREATE to be executed but I am trying to avoid duplication of nodes with the insert.
Any ideas or suggestions on what I should change or how I can improve this ?
Thank you,
--MD.
Try splitting your query into multiple smaller ones. Works better and is easier to manage. Also when using MERGE you should typically want to do it on a single property like an email for person or something unique and then use ON CREATE SET. Should fasten the query. Looks like this:
MERGE (contact:Contact {email: line.EmailAddress})
ON CREATE SET contact.phoneNumber = line.TelephoneNumber
In your case with the person where there is no single unique property you can use a combination of many, but know that every property you add in the MERGE slows down the query.
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname})
ON CREATE SET person.title = line.Title, person.gender = line.Gender,
person.birthday = line.Birthday, person.bloodType = line.BloodType,
person.weight = line.Pounds, person.height = line.FeetInches

Match node with multiple properties in neo4j

I'm having troubles with the current data model and was wondering if anyone could suggest a better way to structure my data.
My nodes labelled 'Person' have 5-10 properties each like: Name, Address, Nationality, Phone, Age... And there is no unique property I could use as an Index.
Since I don't want duplicates, every time I create a new person I use MERGE instead than CREATE. But the problem is that doing a MERGE where I'm matching 5-10 properties on a node slows down the queries enormously.
So would taking out the properties in each Person node as separate nodes labeled Address, Nationality, Phone, Age help the performance of my MERGE query? Any other possible solutions?
Thanks in advance!
How about generating a GUID/UUID that's unique for each person and using that as your ID for each Person? Then you could MERGE on that property quickly and use ON CREATE to set the other properties e.g.
MERGE (p:Person {id: 'abcd-efgh-...'})
ON CREATE SET p.name = "mark", p.address = "..."
Or if not then maybe a hash or combination of all those properties as your key e.g.
MERGE (p:Person {id: "name-address-nationality-phone"})
ON CREATE SET p.name = "...", p.address = "..."

Redundancy of graph in Neo4j

I have created a small graph in Neo4j and the respective nodes and relationships are created. If I run the same code again, the nodes and relationships are again created instead of displaying the message like the nodes and relationships already exist similar like Oracle.
MERGE (a:Person1 { name : 'ROGER', title : 'Developer', age :28})
MERGE (b:Person2 { name : 'Britney', title : 'financier',age :32})
MERGE (c:Person3 { name : 'Christian', title : 'tester',age :24})
Create (a)-[:HUSBAND{last_name:'WHITE'}]->(b) RETURN a,b,c;
So I want to clarify whether Neo4j has duplication or the nodes will be created many times
Thanks in advance...
For reference, the MERGE statements are not creating new persons, only your CREATE statement in the end, see http://console.neo4j.org/r/qrzr6u saying upon re-execution
created 1 relationship set 1 property
You probably want the MERGE on all statements:
MERGE (a:Person1 { name : 'ROGER', title : 'Developer', age :28 })
MERGE (b:Person2 { name : 'Britney', title : 'financier', age :32 })
MERGE (c:Person3 { name : 'Christian', title : 'tester', age :24 })
MERGE (a)-[:HUSBAND { last_name:'WHITE' }]->(b)
RETURN a,b,c;
See http://console.neo4j.org/r/vmfl2v for an example.
MERGE does not re-create data if it already exists. CREATE always creates data, even if it already exists.
The documentation on merge points out that it always matches on the full pattern.
In the case of the cypher snippet you gave us, if you ran it twice you should end up with only one copy of Roger, Britney, and Christian, but I would expect two separate relationships between Roger and Britney, because CREATE always creates.
Watch out for the gotcha on MERGE though, it always merges on the full pattern you specify. So for example if you do this:
MERGE (a:Person {fname: "Henry"});
MERGE (a:Person {fname: "Henry", lname: "Banks"});
Then you get two Henrys, one with no lname property, and one with. This is because the second MERGE looks for a Person node with fname:Henry, lname:Banks and fails to find it, so it creates one. It does not add an extra property to an existing node. This is a common trip-up using MERGE.
Another common trip-up using MERGE (again due to the "whole pattern match") is this:
MERGE (a:Person {name:"Henry"})-[:knows]->(b:Person {name: "Mary"});
MERGE (a:Person {name:"Henry"})-[:married]->(b:Person {name: "Mary"});
This ends up creating two Henry's and two Mary's.

Resources