Bulk Update neo4j relationship properties through csv - neo4j

I have a csv file which have 3 column
Follower_id,Following_id,createTime
My Node in neo4j represent a USER and it has multiple properties one of them is profileId,.Two nodes in the graph can have FOLLOW_RELATIONSHIP and i have to update the createtime for FOLLOW_RELATIONSHIP properties.There are lots of relationships in the graph. I am new in neo4j i dont have much idea about how to do bulk update efficiently.

You can try something like this:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'FILEPATH' AS row
WITH row
MATCH (u1:User{profileId: row.Follower_id})
MATCH (u2:User{profileId: row.Following_id})
MERGE (u1)-[r:FOLLOW_RELATIONSHIP]->(u2)
SET r.createTime = row.createTime
FILEPATH is the path of the file on your system, usually within the database directory itself or some web link. You can learn how to set it from this article.

Related

Skiping relationship creation if already exist, not about MERGE

I am new to neo4j, my data is in csv files trying load them in db and create relationships.
departments.csv(9 rows)
dept_name
dept_no
dept_emp.csv(331603 rows)
dept_no
emp_no
from_date
to_date
I have create nodes with labels departments and dept_emp with all columns as properties. now trying to create relationship between them.
CALL apoc.periodic.iterate("
load csv with headers from 'file:///dept_emp.csv' as row return row",
"match(de:dept_emp)
match(d:departments)
where de.dept_no=row.dept_no and d.dept_no= row.dept_no
merge (de)-[:BELONGS_TO]->(d)",{batchSize:10000, parallel:false})
I do have indexes on :dept_emp and :departments
When I try to run this it is taking ages to complete(many days). When I change the batch size to 10 it created 331603 relations, but it kept on running until it completes all the batches which is taking too long. When it encounters 9 different dept_no at initial rows in dept_emp.csv it is creating all the relations but it has to complete all the batches. In each batch it has to scan all the 331603 relations which were create in first two batches or so. Please help me with optimizing this.
Here I have used apoc.periodic.iterate to deal with the huge data in future, here how the data is related and how I am trying to establish the relation is making the problem . Each department will be having many dept_emp nodes connected.
Currently using Neo4j 4.2.1 version
Max heap size is 1G due to my laptop limitations.
There's no need to create nodes in this fashion, i.e. set properties and then load the same csv again but match all nodes in the graph and do a cartesian join.
Instead:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM 'file:///departments.csv' AS row
CREATE (d:Department) SET d.deptNo=row.dept_no, d.name=row.dept_name
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM 'file:///dept_emp.csv' AS row
MATCH (d:Department {deptNo:row.`dept_no`})
WITH d
MERGE (e:Employee {empNo: row.`emp_no`})
MERGE (e)-[:BELONGS_TO]->(d)

Bulk Insertion in Py2neo

Im writing a custom doc manager for mongo-connector to replicate mongodb documents to neo4j. Here I would like to create bulk relationships. Im using py2neo2020.0.
It seems there are some options in previous versions but not in this version. Is there any way to create bulk nodes and relationships in py2neo
I am currently working on bulk load functionality. There will be some new functions available in the next release. Until then, Cypher UNWIND...CREATE queries are your best bet for performance.
I would strongly recommend switching to the neo4j Python driver, as it's supported by Neo4j directly.
In any case, you can also do bulk insert directly in Cypher, and/or call that Cypher from within Python using the neo4j driver.
I recommend importing the nodes first, and then the relationships. It helps if you have a guaranteed unique identifier for the nodes, because then you can set up an index on that property before loading. Then you can load nodes from a CSV (or better yet a TSV) file like so:
// Create constraint on the unique ID - greatly improves performance.
CREATE CONSTRAINT ON (a:my_label) ASSERT a.id IS UNIQUE
;
// Load the nodes, along with any properties you might want, from
// a file in the Neo4j import folder.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///my_nodes.tsv" AS tsvLine FIELDTERMINATOR '\t'
CREATE (:my_label{id: toInteger(tsvLine.id), my_field2: tsvLine.my_field2})
;
// Load relationships.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///my_relationships.tsv" AS tsvLine FIELDTERMINATOR '\t'
MATCH(parent_node:my_label)
WHERE parent_node.id = toInteger(tsvLine.parent)
MATCH(child_node:my_label)
WHERE child_node.id = toInteger(tsvLine.child)
CREATE(parent_node) --> (child_node)
;

How to add a new record into a existing labelled node in neo4j GraphDB reading from csv file

I am trying to add new record that is a whole row into a labelled node in neo4j graph db. Lets say I have node named Customer
╒══════════════════════════════════════════════════════════════════════╕
│"n" │
╞══════════════════════════════════════════════════════════════════════╡
│{"DISTRICT":"abc","THANA":"xyzzy","DIVISIO│
│N":"abc","REGDATE":"1-2-2015","ID":"0123"} │
├──────────────────────────────────────────────────────────────────────┤
I want to add another row consists with these fields and relevant value from reading a csv file. This nodes holds a large data. so I think apoc with periodic iteration will be good idea for processing it parallel. but I am confused about adding a whole row into a labelled node. I have learnt to update property information through "merge on set on create" approach but can't perform to add new record into labelled node. I am expecting to see a table consisted new record having labelled node (customer). kindly help me to solve this
Here is an example of how to use LOAD CSV to create your neo4j data from a CSV file. Please pay attention to the Introduction section of the docs for important info on how to configure the neo4j server and for where to store the CSV file (if you want to use a local file).
Suppose your data is in a input.csv file that starts with a header row, like this:
DISTRICT,THANA,DIVISION,REGDATE,ID
abc,xyzzy,abc,1-2-2015,0123
def,foobar,nbc,1-3-2015,0124
This query should then create one Customer node per file row:
LOAD CSV WITH HEADERS FROM 'file:///input.csv' AS row
CREATE (c:Customer)
SET c = row

Cypher Query endless loop

I am new to graph databases and especially cypher. I am importing data from my csv. Below is the sample I pulled for some country data and added the cities and states. Now I was pushing the data for areas
LOAD CSV WITH HEADERS FROM
"file:///X:/loc.csv" as csvRow
MATCH (ct:city {poc:csvRow.poc})
MERGE (loc:area {eoc: csvRow.eoc, name:csvRow.loc_nme, name_wr:replace(csvRow.loc_nme," ","")})
MERGE (loc)-[:exists_inside]->(ct)
I've already pushed city and country data using the same query and built a relation between them too.
But when I try to create the areas inside the city it just keeps going, there is no stopping it. (15 mins have passed).
There are 7000 cities in the data I've got from the internet and 90k areas inside those cities.
Is it just taking time or have I messed up with the query.
After the Update
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
"file:///X:/loc.csv" as csvRow
MATCH (ct:city {poc:csvRow.poc})
MERGE (loc:area {eoc: csvRow.eoc, name:csvRow.loc_nme, name_wr:replace(csvRow.loc_nme," ","")})
MERGE (loc)-[:exists_inside]->(ct)
Okay, your query plan shows NodeByLabelScans and filters are being used to find your nodes, which means that every time you match or merge to a node, it has to scan all nodes with the given labels and perform property access on all of them to find the nodes you're looking for.
You need to add indexes (or unique constraints, depending on if the field is supposed to be unique) on the relevant label/property combinations so those lookups will be quick.
So you'll need one on :city(poc), and probably one on :area(eoc), assuming those properties are referring to unique properties.
EDIT
One other big thing I initially missed, you need to add USING PERIODIC COMMIT before the LOAD CSV so the load will batch the writes to the db, that should do the trick here.

Faster way to insert data in neo4j?

I am trying to insert unique nodes and relationship in neo4j.
What I am using :-
Neo4j Community Edition running on Amazon EC2.[Amazon Linux m3.large]
Neo4j Java Rest Binding [ https://github.com/neo4j-contrib/java-rest-binding ]
Data Size and Type :
TSV File [Multiple]. Each contains more than 8 Million Lines [each line represent a node or a relationship].There are more than 10 files for nodes.[= 2 Million Nodes] and another 2 million relations.
I am using UniqueNodeFactory for inserting nodes. And inserting sequentially, couldn't find any way to insert into batches preserving unique nodes.
The problem is it is taking huge time to insert data. For example it took almost a day for inserting 0.3 million unique nodes. Is there any way to fasten the insertion?
Don't do that.
Java-REST-Binding was never made for that.
Use either
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "http://some.url" as line
CREATE (u:User {name:line.name})
You can also use merge (with constraints), create relationships etc.
See my blog post for an example: http://jexp.de/blog/2014/06/using-load-csv-to-import-git-history-into-neo4j/
Or the Neo4j Manual: http://docs.neo4j.org/chunked/milestone/cypherdoc-importing-csv-files-with-cypher.html

Resources