Neo4j - LOAD-CSV not creating all nodes - neo4j

I am just getting started on Neo4J, and I am trying to load some data into Neo4j 3.1 using LOAD CSV with the following script:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///Fake59.csv" AS line
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname, title: line.Title,
gender: line.Gender, birthday: line.Birthday, bloodType: line.BloodType, weight: line.Pounds, height: line.FeetInches})
MERGE (contact:Contact {phoneNumber: line.TelephoneNumber, email: line.EmailAddress})
MERGE (person)-[:CONTACTED_AT]->(contact)
MERGE (color:Color {name: line.Color})
MERGE (person)-[:FAVORITE_COLOR]->(Color)
MERGE (address:Address {streetAddress: line.StreetAddress, city: line.City, zipCode: line.ZipCode})
MERGE (person)-[:LIVES_AT]->(address)
MERGE (state:State {abbr: line.State, name: line.StateFull})
MERGE (city)-[:STATE_OF]->(stage)
MERGE (country:Country {name: line.CountryFull, abbr: line.Country, code: line.TelephoneCountryCode})
MERGE (state)-[:IN_COUNTRY]->(country)
MERGE (credentials:Credentials {userName: line.Username, password: line.Password, GUID: line.GUID})
MERGE (person)-[:LOGS_in]->(credentials)
MERGE (browser:Browser {agent: line.BrowserUserAgent})
MERGE (person)-[:BROWSES_WITH]->(browser)
MERGE (creditCard:CreditCard {number: line.CCNumber, cvv2: line.CVV2, expireDate: line.CCExpires})
MERGE (person)-[:USES_CC]->(creditCard)
MERGE (creditCompany:CreditCompany {name: line.CCType})
MERGE (creditCard)-[:MANAGED_BY]->(creditCompany)
MERGE (occupation:Occupation {name: line.Occupation})
MERGE (person)-[:WORKS_AS]->(occupation)
MERGE (company:Company {name: line.Company})
MERGE (person)-[:WORKDS_FOR]->(company)
MERGE (company)-[:EMPLOYES]->(occupation)
MERGE (vehicle:Vehicle {name: line.Vehicle})
MERGE (person)-[:DRIVES]->(vehicle)
The input file has about 50k rows. It runs for a few hours the process does not finish, but after that time if I query the database I see that only the node type (Person) got created. If I run a smaller file with 3 entries only all the additional nodes and relationships are created.
I have already changed the amount of memory allocated to Neo4j and to the JVM, and still no success. I understand that MERGE takes longer than CREATE to be executed but I am trying to avoid duplication of nodes with the insert.
Any ideas or suggestions on what I should change or how I can improve this ?
Thank you,
--MD.

Try splitting your query into multiple smaller ones. Works better and is easier to manage. Also when using MERGE you should typically want to do it on a single property like an email for person or something unique and then use ON CREATE SET. Should fasten the query. Looks like this:
MERGE (contact:Contact {email: line.EmailAddress})
ON CREATE SET contact.phoneNumber = line.TelephoneNumber
In your case with the person where there is no single unique property you can use a combination of many, but know that every property you add in the MERGE slows down the query.
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname})
ON CREATE SET person.title = line.Title, person.gender = line.Gender,
person.birthday = line.Birthday, person.bloodType = line.BloodType,
person.weight = line.Pounds, person.height = line.FeetInches

Related

How to prevent neo4j MERGE from creating duplicate relationships?

I am attempting to create nodes and relationships if they do not exist. I do not know ahead of time if anything in the DB exists.
This is my initial query:
MERGE (t:type { name: 'aaa'})
MERGE (m:model { name: 'bbb'})
MERGE (r:region {name: 'ccc'})
MERGE (p:param {name: 'ddd'})
MERGE (i:init {value: 123})
MERGE (u:forecast {url: 'http://something.png'})
MERGE (t)-[:HAS]-(m)-[:HAS]-(r)-[:HAS]-(p)-[:HAS]-(i)-[:HAS]-(u)
This correctly produces a graph like this:
Then I run this query again, but this time I change the name of the "model" object to "bbc" (instead of "bbb"):
MERGE (t:type { name: 'aaa'})
MERGE (m:model { name: 'bbc'})
MERGE (r:region {name: 'ccc'})
MERGE (p:param {name: 'ddd'})
MERGE (i:init {value: 123})
MERGE (u:forecast {url: 'http://something.png'})
MERGE (t)-[:HAS]-(m)-[:HAS]-(r)-[:HAS]-(p)-[:HAS]-(i)-[:HAS]-(u)
Now, however, my graph looks like this:
Everything looks correct except for the three duplicated relationships.
I realize that MATCH will create the whole path if it does not exist. There must be some way to avoid creating duplicate relationships, though.
I would appreciate being pointed in the right direction!
The MERGE statement checks if the pattern as a whole already exists or not. So, if there is one node different, the whole pattern is determined as non-existent and all relationships are created.
The solution is to split this MERGE statement into multiple, i.e. one MERGE for each relationship:
MERGE (t)-[:HAS]-(m)-[:HAS]-(r)-[:HAS]-(p)-[:HAS]-(i)-[:HAS]-(u)
becomes
MERGE (t)-[:HAS]-(m)
MERGE (m)-[:HAS]-(r)
MERGE (r)-[:HAS]-(p)
MERGE (p)-[:HAS]-(i)
MERGE (i)-[:HAS]-(u)

Merging duplicate nodes and their relationship

I have a requirements to merge the duplicate nodes and keep one copy. Issue I am facing is, when I merge nodes, there will be duplicate relationship created. Instead, I want to merge the relationship as well without duplicates.
Can you give some suggestions?
CREATE (n:People { name: 'Person1', lastname: 'Person1LastName', email_ID:'Person1#test2.com' })
CREATE (n:People { name: 'Person2', lastname: 'Person2LastName', email_ID:'Person2#test2.com' })
CREATE (n:People { name: 'Person2', lastname: 'Person2LastName', staysin:'California' })
CREATE (n:People { name: 'Person3', lastname: 'Person3LastName', email_ID:'Person3#test2.com' })
Person2 -[r:Has_Met]->(Person1)
(Person3)-[r:FRIENDS_WITH]->(Person2) having email_ID='Person2#test2.com'
Now i wants to keep Person2 nodes and keep both the relationship with other nodes -
something like this:
MATCH (p:People{name:"person1"})
WITH p.name as name, collect(p) as nodes, count() as cnt
WHERE cnt > 1
WITH head(nodes) as first, tail(nodes) as rest
UNWIND rest AS to_delete
MATCH (to_delete)-[r:HAS_MET]->(e:name)
MERGE (first)-[r1:HAS_MET]->(e)
on create SET r1=r
SET to_delete.isDuplicate=true
RETURN count();
This is a related question, but here I know only one relationship (HAS_MET) will be considered. How do I consider all the relationships once?
Without presentation of your model or listing of sample data, unfortunately, I am only able to answer in general, which may help you nevertheless.
Have a look at the APOC library and consider the use of the procedures Merge Nodes and Redirect Relationship To. You will find explanatory images and Cypher statements there for each case.
Extension after question update
Initial situation
CREATE
(p1:People {name: 'Person1', lastname: 'Person1LastName', email_ID: 'Person1#test2.com'}),
(p2a:People {name: 'Person2', lastname: 'Person2LastName', email_ID: 'Person2#test2.com'}),
(p2b:People {name: 'Person2', lastname: 'Person2LastName', staysin: 'California'}),
(p3:People {name: 'Person3', lastname: 'Person3LastName', email_ID: 'Person3#test2.com'}),
(p2a)-[:HAS_MET]->(p1),
(p2b)-[:HAS_MET]->(p1),
(p3)-[:FRIENDS_WITH]->(p2a);
Solution
MATCH (oneNode:People {email_ID: 'Person2#test2.com'}), (otherNode:People {staysin: 'California'})
CALL apoc.refactor.mergeNodes([oneNode, otherNode])
YIELD node
MATCH (node)-[relation:HAS_MET]->(:People)
WITH tail(collect(relation)) AS surplusRelations
UNWIND surplusRelations AS surplusRelation
DELETE surplusRelation;
line 1: select both to be combined nodes
line 2: call appropriate merge nodes procedure
line 3: define result variable
line 4: identify all relationships between the combined node and a met person (there are two at least)
line 5: select all relationships but the first one
line 7: delete all surplus relationships
Result
merged node Person2, containing all attributes from source nodes (note especially email_ID and staysin)
one relationship Person1-Person2

Create multiple relationships of same type with different properties between two nodes from csv

I am facing issue while creating multiple relationships of same type with different properties between two nodes in Neo4jDesktop.
Nodes dataset:
File Name: 1.csv
File Contents:
Id,Desc
A,Alpha
B,Beta
C,Charlie
D,Doyce
Relationships Dataset:
File Name: 2.csv
File Contents:
SeqNo,Date,Count,Weight,From,To
0,2018-04-01,12,308,A,B
1,2018-04-01,3,475,B,C
2,2018-04-01,23,308,C,D
3,2018-04-01,32,524,D,A
4,2018-04-01,0,308,A,C
5,2018-04-01,23,237,B,D
6,2018-04-01,54,308,B,A
7,2018-04-01,23,237,D,B
8,2018-04-01,18,308,D,C
9,2018-04-01,23,308,C,A
10,2018-04-01,78,475,B,C
11,2018-04-01,67,308,A,B
12,2018-04-01,56,237,D,B
13,2018-04-01,34,308,A,C
14,2018-04-01,27,524,A,D
15,2018-04-01,84,237,D,B
// Create Nodes
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/1.csv" AS row
CREATE (:Node {Id: row.Id, Desc: row.Desc});
// Create Relationships
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/2.csv" AS row
MERGE (from:Node {Id: row.From})
MERGE (to:Node {Id: row.To})
MERGE (from)-[rel:RELATED_AS]->(to)
ON CREATE SET rel.SeqNo = toInt(row.SeqNo),
rel.Date = row.flightDate,
rel.Count = toInteger(row.Count),
rel.Weight = toFloat(row.Weight)
This syntax works and creates only 11 relationships, with incoming and outgoing relationships between two nodes.
It is ignoring the additional relationships between A-B, B-C, A-C and D-B (2 additional relationships).
How to create the graph with all the 16 relationships?
Thanks in advance.
Mel.
Your second query is MERGING the relationship (from)-[rel:RELATED_AS]->(to) so Cypher is matching that pattern if it exists. So the subsequent ones are matched but then the values are never updated because of the ON CREATE statement.
Since you want to create the relationships every time you could replace your statement with something like the following.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/2.csv" AS row
MERGE (from:Node {Id: row.From})
MERGE (to:Node {Id: row.To})
CREATE (from)-[rel:RELATED_AS {SeqNo: row.SeqNo, Date: row.flightDate, Count: toInteger(row.Count), Weight: toFloat(row.Weight)}]->(to)

Multiple columns of data import per node?

Is it possible to have multiple columns of information on a Name node when importing a csv? For example, Name is John Doe, Company, Position is President of Sales, Located in California, etc., on a single node. If so, any suggestions on how to merge that information in a single name node in cypher during upload? Lets say I have columns of information as Position, State, County, Phone. So far All I've been able to come up with is the Name and relation to the Company that he/she works for.
LOAD CSV WITH HEADERS FROM 'FILE:///company_name.csv' AS line
MERGE (C:Company {Company: line.Company })
MERGE (N:Name {Name: line.Name })
MERGE (C)<-[:works_for]-(N);
The best approach would be to use ON CREATE SET and ON MATCH SET after creating/matching the node with MERGE on a unique key value.
LOAD CSV WITH HEADERS FROM 'FILE:///company_name.csv' AS line
MERGE (C:Company {Company: line.Company })
MERGE (N:Name {Name: line.Name })
ON CREATE SET
N.Position = line.Position,
N.Location = line.Location,
N.Country = line.Country,
N.Phone = line.Phone
ON MATCH SET
N.Position = line.Position,
N.Location = line.Location,
N.Country = line.Country,
N.Phone = line.Phone
MERGE (C)<-[:works_for]-(N);
Alternatively you can set everything on a single node but if there are multiple rows in your csv file that correspond to the same identity and some of the values you are setting are different on those rows then it will result in multiple nodes in the database afterwards.
LOAD CSV WITH HEADERS FROM 'FILE:///company_name.csv' AS line
MERGE (C:Company {Company: line.Company })
MERGE (N:Name {Name: line.Name, Position: line.Position, Location line.Location, Country: line.Country, Phone: line.Phone })
MERGE (C)<-[:works_for]-(N);

Multiple relations like join tables (neo4j)

How do I express the following in neo4j?
match or create user bob; bob works at studio; while at studio, he's allowed to doodle; while at studio, he's also allowed to type.
Here's what I have:
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:works_at]->(c)-[:allowed_to]->(p:permission {name:'doodle'})
MERGE (u)-[:works_at]->(c)-[:allowed_to]->(p:permission {name:'type'})
This doesn't work as permission becomes a relation of company.
Also, is it possible to chain relations such that:
MERGE work=(u)-[:works_at]->(c)
CREATE (work)-[:allowed_to]->(p:permission {name:'doodle'})
CREATE (work)-[:allowed_to]->(p:permission {name:'type'})
where you assign a relation to a variable to continue it later on in another query?
How about modelling it so the company grants the permission? Something like this...
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:works_at]->(c)
MERGE (u)-[:allowed_to]->(p1:permission {name:'doodle'})<-[:GRANTS]-(c)
MERGE (u)-[:allowed_to]->(p2:permission {name:'type'})<-[:GRANTS]-(c)
RETURN *
You can't really refer to objects via identifiers/variables you have created previously in other queries. You would have to re-match or merge those previously created objects in your new query.
Part 2 could be modelled something like this..
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:DOES]->(work:Work {start_date: timestamp()} )-[:AT]->(c)
CREATE (work)-[:allowed_to]->(p:permission {name:'doodle'})
CREATE (work)-[:allowed_to]->(p:permission {name:'type'})
As an alternate, if you never need to lookup all users with a certain permission at a company, you could maintain a collection of permissions as relationship properties.
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[r:works_at]->(c)
SET r.permissions = ['doodle', 'type']

Resources