Multiple columns of data import per node? - neo4j

Is it possible to have multiple columns of information on a Name node when importing a csv? For example, Name is John Doe, Company, Position is President of Sales, Located in California, etc., on a single node. If so, any suggestions on how to merge that information in a single name node in cypher during upload? Lets say I have columns of information as Position, State, County, Phone. So far All I've been able to come up with is the Name and relation to the Company that he/she works for.
LOAD CSV WITH HEADERS FROM 'FILE:///company_name.csv' AS line
MERGE (C:Company {Company: line.Company })
MERGE (N:Name {Name: line.Name })
MERGE (C)<-[:works_for]-(N);

The best approach would be to use ON CREATE SET and ON MATCH SET after creating/matching the node with MERGE on a unique key value.
LOAD CSV WITH HEADERS FROM 'FILE:///company_name.csv' AS line
MERGE (C:Company {Company: line.Company })
MERGE (N:Name {Name: line.Name })
ON CREATE SET
N.Position = line.Position,
N.Location = line.Location,
N.Country = line.Country,
N.Phone = line.Phone
ON MATCH SET
N.Position = line.Position,
N.Location = line.Location,
N.Country = line.Country,
N.Phone = line.Phone
MERGE (C)<-[:works_for]-(N);
Alternatively you can set everything on a single node but if there are multiple rows in your csv file that correspond to the same identity and some of the values you are setting are different on those rows then it will result in multiple nodes in the database afterwards.
LOAD CSV WITH HEADERS FROM 'FILE:///company_name.csv' AS line
MERGE (C:Company {Company: line.Company })
MERGE (N:Name {Name: line.Name, Position: line.Position, Location line.Location, Country: line.Country, Phone: line.Phone })
MERGE (C)<-[:works_for]-(N);

Related

Merging Nodes in Neo4j

I am trying merge two neo4j graphs using CYPHER. The first one is the example of Countries and their Capitals. The second one is a sample example I created.
WITH "https://gist.githubusercontent.com/jimmycrequer/7aa867900d0cf0b9588d4354f09cb286/raw/countries.json" AS url
CALL apoc.load.json(url) YIELD value AS v
MERGE (c:Country {name: v.name})
SET c.population = v.population, c.area = v.area
CREATE (capital:City {name: v.capital})
CREATE (c)<-[:IS_CAPITAL_OF]-(capital)
FOREACH (n IN v.neighbors |
MERGE (neighbor:Country {name: n})
MERGE (c)-[:IS_NEIGHBOR_OF]-(neighbor)
)
To this, I'm trying to add my graph
//Manufacturers
MERGE (BMW:Manufacturer {name:"BMW" , headquarters :"Germany" , employees :100306,factories:25 ,revenue:95.8 ,production:1668982 ,sales: 1688982 })
MERGE(Germany:Country)-[:MANUFACTURERS]->(BMW)
The Node Germany has the following properties
id:103, area:357022, name:Germany, population:8288000
When, I try to look for the final output. I see there is an empty blank node created for the relationship [:MANUFACTURERS] and a node BMW is created.
Change your second query a bit. Just because you name the node variable Germany, Neo4j doesnt know you want to match the country with the name property Germany.
And in most cases you should merge or match nodes first and only then add tje relationship between the two
MERGE (BMW:Manufacturer {name:"BMW" , headquarters :"Germany" , employees :100306,factories:25 ,revenue:95.8 ,production:1668982 ,sales: 1688982 })
MERGE (Germany:Country{name:'Germany})
MERGE (Germany)-[:MANUFACTURERS]->(BMW)

How to create different relationships in neo4j using py2neo reading csv files?

I would like to read in a csv file where the first two columns have node names, and the third column has the node relationship. Currently I use this in py2neo:
query2 = """
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///data.csv" AS line
MERGE (topic:Topic {name: line.Topic})
MERGE (result:Result {name: line.Result})
CREATE UNIQUE (topic)-[:DISCUSSES]->(result)
"""
How can I use the third column in the csv file to set the relationship, instead of having all relationships set as "DISCUSSES"?
I tried this, but it does not have a UNIQUE option:
query1 = """
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///data.csv" AS line
MERGE (topic:Topic {name: line.Topic})
MERGE (result:Result {name: line.Result})
MERGE (relation:Relation {name: line.Relation})
WITH topic,result,line
CALL apoc.merge.relationship(topic, line.Relation, {}, {}, result) YIELD rel as rel1
RETURN topic,result
"""
Actually, your second query is almost correct (except that it has an extraneous MERGE clause). Here is the corrected query:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///data.csv" AS line
MERGE (topic:Topic {name: line.Topic})
MERGE (result:Result {name: line.Result})
WITH topic, result, line
CALL apoc.merge.relationship(topic, line.Relation, {}, {}, result) YIELD rel
RETURN topic, result
The apoc.merge.relationship call is equivalent to doing a MERGE to create the relationship (with a dynamic label) if it does not already exist.

Create multiple relationships of same type with different properties between two nodes from csv

I am facing issue while creating multiple relationships of same type with different properties between two nodes in Neo4jDesktop.
Nodes dataset:
File Name: 1.csv
File Contents:
Id,Desc
A,Alpha
B,Beta
C,Charlie
D,Doyce
Relationships Dataset:
File Name: 2.csv
File Contents:
SeqNo,Date,Count,Weight,From,To
0,2018-04-01,12,308,A,B
1,2018-04-01,3,475,B,C
2,2018-04-01,23,308,C,D
3,2018-04-01,32,524,D,A
4,2018-04-01,0,308,A,C
5,2018-04-01,23,237,B,D
6,2018-04-01,54,308,B,A
7,2018-04-01,23,237,D,B
8,2018-04-01,18,308,D,C
9,2018-04-01,23,308,C,A
10,2018-04-01,78,475,B,C
11,2018-04-01,67,308,A,B
12,2018-04-01,56,237,D,B
13,2018-04-01,34,308,A,C
14,2018-04-01,27,524,A,D
15,2018-04-01,84,237,D,B
// Create Nodes
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/1.csv" AS row
CREATE (:Node {Id: row.Id, Desc: row.Desc});
// Create Relationships
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/2.csv" AS row
MERGE (from:Node {Id: row.From})
MERGE (to:Node {Id: row.To})
MERGE (from)-[rel:RELATED_AS]->(to)
ON CREATE SET rel.SeqNo = toInt(row.SeqNo),
rel.Date = row.flightDate,
rel.Count = toInteger(row.Count),
rel.Weight = toFloat(row.Weight)
This syntax works and creates only 11 relationships, with incoming and outgoing relationships between two nodes.
It is ignoring the additional relationships between A-B, B-C, A-C and D-B (2 additional relationships).
How to create the graph with all the 16 relationships?
Thanks in advance.
Mel.
Your second query is MERGING the relationship (from)-[rel:RELATED_AS]->(to) so Cypher is matching that pattern if it exists. So the subsequent ones are matched but then the values are never updated because of the ON CREATE statement.
Since you want to create the relationships every time you could replace your statement with something like the following.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/2.csv" AS row
MERGE (from:Node {Id: row.From})
MERGE (to:Node {Id: row.To})
CREATE (from)-[rel:RELATED_AS {SeqNo: row.SeqNo, Date: row.flightDate, Count: toInteger(row.Count), Weight: toFloat(row.Weight)}]->(to)

Upload csv with multiple relations

I'm trying to build relationships between a list of companies (with employee hierarchy) that are partnered together in an ecosystem in addition to investors. I have 6 columns in my csv for: Company, Investor, Customer (J labeled as a company but for relationship customer), CompanyX (X labeled as a company but for relationship for partner companies), Employee (for employees), and EmployeeL (L for hierarchy).
LOAD CSV WITH HEADERS FROM 'FILE:///ecosystem.csv' AS line
MERGE (C:Company {Company: line.Company })
MERGE (I:Investor {Investor: line.Investor })
MERGE (J:Customer {Company: line.Company })
MERGE (X:CompanyX {Company: line.Company })
MERGE (N:Employee {Employee: line.Employee })
MERGE (L:EmployeeL {Employee: line.Employee })
MERGE (C)<-[:works_for]-(N)
MERGE (L)<-[:reports_to]-(N)
MERGE (J)<-[:Customer]-(C)
MERGE (X)<-[:Partners]->(C)
MERGE (C)<-[:Investor]-(I);
Am I over complicating this? I'm new to Cypher and I'm not sure I'm doing this right and the last time I did an upload similar to this I had to wipe my database clean. Also how do I input a null value for J/I/C since not all hierarchy's are complete? When there is a null value, I am unable to upload the csv.
If you have only three columns in your csv, I would create a query like this:
LOAD CSV WITH HEADERS FROM 'FILE:///ecosystem.csv' AS line
MERGE (C:Company {name: line.Company })
MERGE (I:Investor {name: line.Investor })
MERGE (N:Employee {name: line.Employee })
MERGE (C)<-[:Investor]-(I)
MERGE (C)<-[:works_for]-(N)
You should avoid using birectional directions as :works_for and :reports_to. Check this article.

Neo4j - LOAD-CSV not creating all nodes

I am just getting started on Neo4J, and I am trying to load some data into Neo4j 3.1 using LOAD CSV with the following script:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///Fake59.csv" AS line
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname, title: line.Title,
gender: line.Gender, birthday: line.Birthday, bloodType: line.BloodType, weight: line.Pounds, height: line.FeetInches})
MERGE (contact:Contact {phoneNumber: line.TelephoneNumber, email: line.EmailAddress})
MERGE (person)-[:CONTACTED_AT]->(contact)
MERGE (color:Color {name: line.Color})
MERGE (person)-[:FAVORITE_COLOR]->(Color)
MERGE (address:Address {streetAddress: line.StreetAddress, city: line.City, zipCode: line.ZipCode})
MERGE (person)-[:LIVES_AT]->(address)
MERGE (state:State {abbr: line.State, name: line.StateFull})
MERGE (city)-[:STATE_OF]->(stage)
MERGE (country:Country {name: line.CountryFull, abbr: line.Country, code: line.TelephoneCountryCode})
MERGE (state)-[:IN_COUNTRY]->(country)
MERGE (credentials:Credentials {userName: line.Username, password: line.Password, GUID: line.GUID})
MERGE (person)-[:LOGS_in]->(credentials)
MERGE (browser:Browser {agent: line.BrowserUserAgent})
MERGE (person)-[:BROWSES_WITH]->(browser)
MERGE (creditCard:CreditCard {number: line.CCNumber, cvv2: line.CVV2, expireDate: line.CCExpires})
MERGE (person)-[:USES_CC]->(creditCard)
MERGE (creditCompany:CreditCompany {name: line.CCType})
MERGE (creditCard)-[:MANAGED_BY]->(creditCompany)
MERGE (occupation:Occupation {name: line.Occupation})
MERGE (person)-[:WORKS_AS]->(occupation)
MERGE (company:Company {name: line.Company})
MERGE (person)-[:WORKDS_FOR]->(company)
MERGE (company)-[:EMPLOYES]->(occupation)
MERGE (vehicle:Vehicle {name: line.Vehicle})
MERGE (person)-[:DRIVES]->(vehicle)
The input file has about 50k rows. It runs for a few hours the process does not finish, but after that time if I query the database I see that only the node type (Person) got created. If I run a smaller file with 3 entries only all the additional nodes and relationships are created.
I have already changed the amount of memory allocated to Neo4j and to the JVM, and still no success. I understand that MERGE takes longer than CREATE to be executed but I am trying to avoid duplication of nodes with the insert.
Any ideas or suggestions on what I should change or how I can improve this ?
Thank you,
--MD.
Try splitting your query into multiple smaller ones. Works better and is easier to manage. Also when using MERGE you should typically want to do it on a single property like an email for person or something unique and then use ON CREATE SET. Should fasten the query. Looks like this:
MERGE (contact:Contact {email: line.EmailAddress})
ON CREATE SET contact.phoneNumber = line.TelephoneNumber
In your case with the person where there is no single unique property you can use a combination of many, but know that every property you add in the MERGE slows down the query.
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname})
ON CREATE SET person.title = line.Title, person.gender = line.Gender,
person.birthday = line.Birthday, person.bloodType = line.BloodType,
person.weight = line.Pounds, person.height = line.FeetInches

Resources