CSV Data import in neo4j - neo4j

I am trying to add relationship between existing employee nodes in my sample database from csv file using the following commands:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///newmsg1.csv' AS line
WITH line
MATCH (e:Employee {mail: line.fromemail}), (b:Employee {mail: line.toemail})
CREATE (e)-[m:Message]->(b);
The problem i am facing is that, while there are only 71253 entries in the csv file in which each entry has a "fromemail" and "toemail",
I am getting "Created 240643 relationships, completed after 506170 ms." as the output. I am not able to understand what I am doing wrong. Kindly help me. Thanks in advance!

You can use MERGE to ensure uniqueness of relationships:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///newmsg1.csv' AS line
WITH line
MATCH (e:Employee {mail: line.fromemail}), (b:Employee {mail: line.toemail})
MERGE (e)-[m:Message]->(b);

Try change your create to CREATE UNIQUE:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///newmsg1.csv' AS line
WITH line
MATCH (e:Employee {mail: line.fromemail}), (b:Employee {mail: line.toemail})
CREATE UNIQUE (e)-[m:Message]->(b);
From the docs:
CREATE UNIQUE is in the middle of MATCH and CREATE — it will match
what it can, and create what is missing. CREATE UNIQUE will always
make the least change possible to the graph — if it can use parts of
the existing graph, it will.

Related

Neo4J - unable to create relationships (30,000)

I've got two csv files Job (30,000 entries) and Cat (30 entries) imported into neo4j and am trying to create a relationship between them
Each Job has a cat_ID and Cat contains the category name and ID
after executing the following
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MATCH (job:Job {cat_ID: row.cat_ID})
MATCH (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
it returns (no changes, no records)
I received a prompt recommending that I index the category and so using
Create INDEX ON :Job(cat_id); I did, but I still get the same error
How do I create a relationship between the two?
I am able to get this to work on a smaller dataset
You are probably trying to match on non-existing nodes. Try
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MERGE (job:Job {cat_ID: row.cat_ID})
MERGE (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
Have a look in your logs and see if you are running out of memory.
You could try chunking the data set up into smaller pieces with Periodic Commit and see if that helps:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MATCH (job:Job {cat_ID: row.cat_ID})
MATCH (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)

Cypher Import from CSV to Neo4J - How To Improve Performance

I am importing the following to Neo4J:
categories.csv
CategoryName1
CategoryName2
CategoryName3
...
categories_relations.csv
category_parent category_child
CategoryName3 CategoryName10
CategoryName32 CategoryName41
...
Basically, categories_relations.csv shows parent-child relationships between the categories from categories.csv.
I imported the first csv file with the following query which went well and pretty quickly:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories.csv' as line
CREATE (:Category {name:line[0]})
Then I imported the second csv file with:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories_relations.csv' as line
MATCH (a:Category),(b:Category)
WHERE a.name = line[0] AND b.name = line[1]
CREATE (a)-[r:ISPARENTOF]->(b)
I have about 2 million nodes.
I tried executing the 2nd query and it is taking quite long. Can I make the query execute more quickly?
Confirm you are matching on right property. You are setting only one property for Category node i.e. name while creating
categories. But you are matching on property id in your second
query to create the relationships between categories.
For executing the 2nd query faster you can add an index on the property (here id) which you are matching Category nodes on.
CREATE INDEX ON :Category(id)
If it still takes time, You can refer my answer to Load CSV here

Merging with null values

So I've been trying to load a csv file where participants have had to rate whom they will get advice from/talk to when they have problems with studying. the table looks something like this:
The alphabets are just names of the people. As you can see there are nulls in this table. I'm trying to load this into Neo4j so we can visualise who is choosing who and if this relationship is reciprocal. Any idea? All help is much appreciated!
Using IS NOT NULL can solve your problem.
LOAD CSV WITH HEADERS FROM file:///xyz.csv AS line
WITH line LIMIT 10
RETURN line
Using this you can see how your data is being loaded.(Don't forget to use limit). Since all the values loaded from CSV are in string format, you'll get your empty column values as this -> "".
From that you can create your node by following the blog i've referenced. And also using IS NOT NULL you can skip the null values and create your schema.
Example:
MERGE (n:Person{name:line.Person})-[:CHOSE]-(:Study1{name:line[1]})
MERGE (n)-[:CHOSE]-(:Study2{name:line[2]})
MERGE (n)-[:CHOSE]-(:Study3{name:line[3]})
MERGE (n)-[:CHOSE]-(:Study4{name:line[4]})
MERGE (n)-[:CHOSE]-(:Study5{name:line[5]})
OR you can use
WITH Line[1] as Person, Line[2] as Study1 and so on...
WHERE Study5 IS NOT NULL
MERGE (n:Person{name:line.Person})-[:CHOSE]-(:Study1{name:line[1]})
MERGE (n)-[:CHOSE]-(:Study2{name:line[2]})
MERGE (n)-[:CHOSE]-(:Study3{name:line[3]})
MERGE (n)-[:CHOSE]-(:Study4{name:line[4]})
MERGE (n)-[:CHOSE]-(:Study5{name:line[5]})
For more detail go through this example.
Hope this helps!

Neo4j relationship writing process

I am using Neo4j graph to create graph database. using load csv command to create relationship. It takes 2hours to load 1 million data rows relationship into any relationship. Is there any other way to create relationship faster?
CREATE is faster than MERGE.And using MERGE or MATCH can result 'Eager Operation'. Please go through this blog for more reference.
As a work around you can try the below query.
You can use WITH in the query for avoinding cartisian product and whole 'row' to pass down. Try adding index to "indexed_date" and try the below query.
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///raw.csv" AS row
MATCH (tweet_id:tweet_id {name: row.tweet_id}) with tweet_id
MATCH (indexed_date:indexed_date {name: row.indexed_date}) with indexed_date,tweet_id
CREATE (indexed_date)-[date_i_tweet:date_i_tweet]->(tweet_id);
Hope this helps
For your query, you should have :
a unique constraint on tweet_id : CREATE CONSTRAINT ON (n:tweet_id) ASSETS n.tweet_id IS UNIQUE
a unique constraint or an indexed_date : CREATE CONSTRAINT ON (n:indexed_date) ASSETS n.indexed_date IS UNIQUE
Cheers

How to import csv data to create nodes with specified properties in neo4j

I have to create 30+ nodes with below CSV data as properties,
id,name,skill,cur_company,pre_company,college,location
1,"pavan","java","CGI","CSC","JNTU","HYDERABAD"
2,"ravi","java","TCS","CSC","SGPL","DELHI"
...
How to create nodes by importing above data. like,
u1:User {id:1,name:"pavan",skill:"java",cur_company:"CGI",prev_company:"CSC",location:"HYDERBAD"}
u2:User {id:2,name:"ravi",skill:"java",cur_company:"TCS",prev_company:"CSC",location:"DELHI"}
There is a dedicated LOAD CSV command in Cypher:
load csv with headers from "file-url" as data
create (u:User {data}}
or
load csv with headers from "file-url" as data
create (u:User {id:data.id, name:data.name, ....}}
Before applying above command , do following changes in neo4j.conf file:
Comment the #dbms.security.allow_csv_import_from_file_urls=true line
and
uncomment the dbms.directories.import=import line

Resources