I'm migrating data from RDBMS to Neo4j. I'm using 'neo4j-admin import' tool for bulk import by loading a csv dump. In order to accomodate live updates, I'm again getting the csv dump from RDBMS. Now, I'm using 'apoc.load.csv'.
I've my data in one file, say 'upd_product.csv'
And I've the headers in 'product_h.csv'
Now I want to use apoc.load.csv
Call apoc.load.csv('/upd_product.csv') yield map, list
Match (p: Product {id: line[0]})
Set p = map
In order to have this map, I need to specify the headers, and there exists no documentation on how to do that.
Please help me in this context.
Thanks in advance.
Just prepend the contents of product_h.csv (presumably a single line) to the beginning of upd_product.csv. Afterwards, you can just do this (assuming the header value for the first column is id):
CALL apoc.load.csv('/upd_product.csv') YIELD map
MATCH (p:Product {id: map.id})
SET p = map
Related
I am trying to create nodes by loading a csv file. Since I wanted to exclude some columns I decided to use apoc.load.csv instead of the simpler LOAD CSV command.
I wanted to have all the columns present as corresponding property value for the nodes. However I am not able to figure out how to do it. When the columns are less you can hardcode it, but in my real dataset I have more than 60 columns so I was hoping that there would be a programmatic way to achieve what I want to do.
Demo Dataset you can use data.csv -
name,age,beverage,country_from,fruit
Selma,9,Soda,RU,Apple
Rana,12,Tea,USA,Orange
Selina,19,Cola,CA,Guava
What I have tried so far that doesn't work yet -
CALL apoc.load.csv('data.csv', {header:true, ignore:['beverage'],
mapping:{
age: {type:'int'},
country_from: {name: "country"}
}
})
YIELD map as row
CREATE (e:Entity $row)
CREATE (f:Fruit {name: row.fruit})
CREATE (c:Country {name: row.country_from})
MERGE (e:Entity)-[:EATS]->(f:Fruit)
MERGE (e:Entity)-[:IS_FROM]->(c:Country)
RETURN e,c,f
Expected Output:
The graphdatabase has the Entity nodes with properties name,age,country,fruit
Initially I was using {row} but then I got the error as described here
The old parameter syntax `{param}` is no longer supported. Please use `$param` instead
so I switched to using $row but then I get -
Expected parameter(s): row
I have followed the ideas from the following links -
https://neo4j-contrib.github.io/neo4j-apoc-procedures/3.4/export-import/load-csv/
https://neo4j.com/labs/apoc/4.1/import/load-csv/
I'm using cypher and the neo4j browser to create nodes from csv input.
I want to read in each row of my csv file with headers and then create a node with that row as properties.
MY current code is:
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS ROW
WITH ROW
CREATE (n:node $ROW)
This throws an error saying parameter missing.
Try this
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS row
CREATE (n:node)
SET n+= row
In Cypher, variables that start with "$" must be passed to the query as parameters. Your Cypher code is locally binding values to the ROW variable (and not passing a parameter), so change $ROW to ROW.
In addition, if you want to make sure that you do not generate duplicate nodes, you should consider using MERGE instead of CREATE. But before you do so, you must carefully read the documentation on MERGE to understand how to use it properly.
i m new at neo4j and i d like to upload a csv file and create a set of nodes. However i have already some existing nodes that may exist on that csv file. Is there an option to load the csv, create the nodes based on each row and in case the node already exists skip that row?
Thanks
You can use the MERGE clause to avoid creating duplicate nodes and relationships.
However, you need to carefully read the documentation to understand how to use MERGE, as incorrect usage can cause the unintentional creation of nodes and relationships.
Merge will give you what you want, however you must be careful how you identify the record uniquely to prevent creating duplicates
I'll put the desired final form first as attention spans seem to be on the decline...
// This one is safe assuming name is a true unique identifier of your Friends
// and that their favorite colors and foods may change over time
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0]})
set a.favorite_food = line[1]
set a.favorite_color = line[2]
The merge above will create or find the Friend node with that matching name and then, regardless of whether we are creating it or updating it, set the attributes on it.
If we were to instead provide all the attributes in the merge as such:
// This one is dangerous - all attributes must match in order
// to find the existing Friend node
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0], favorite_food: line[1], favorite_color: line[2]})
Then we would fail to find an existing friend everytime their favorite_food or favorite_color was updated in our data being (re)loaded.
Here's an example for anyone who's imagination hasn't fully filled in the blanks...
//Last month's file contained:
Bob Marley,Hemp Seeds,Green
//This month's file contained:
Bob Marley,Soylent Green,Rainbow
I have two files. First file contain list of users with certain properties. I have loaded them in Neo4j as below:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///users.csv" AS row
CREATE (U:User{userid:row.userid, username:row.username})
Now, I have a second file that contains pincodes of the places the user stays or ever stayed at. Example:
User Pincodes
A 001
B 002
A 003
I want to add a property to the label User such that it adds all the pincodes as a list. But when I am using the below query, it only stores the most latest value and not all the values as a list.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///user_pincode.csv" AS line
MATCH (U:User)
WHERE U.userid=line.userid
SET U.pincode=[line.pincode]
Any suggestions would be really helpful.
[UPDATED]
You can do this:
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///user_pincode.csv" AS line
MATCH (u:User)
WHERE u.userid=line[0]
SET u.pincode = COALESCE(u.pincode, []) + line[1]
Since your CSV data has no header, this query omits the WITH HEADERS option, and treats line as an array. It appends the new pincode to the end of the existing pincode list (or, if the pincode property did not already exist, initialize that property with a single-element list). The COALESCE function returns the first argument that is non-NULL.
Hello I want to delete all nodes with the label GRAPH_OBJECT that have a property value (lets call it myprop) that is not in a list of numeric values that I have in a CSV or text file.
How do a I accomplish this with Cypher?
This should work.
// load csv
LOAD CSV FROM "file://values.txt" AS row
// create a collection of the first column turned into numeric values
WITH collect(toInt(row[0])) AS blacklist
// find the nodes
MATCH (node:GRAPH_OBJECT)
// for any of the properties of the node, if it's value is in our blacklist
WHERE ANY(property in keys(node) WHERE node[property] IN blacklist)
// delete node and relationships
DETACH DELETE node;
Starting with Michael Hunger's code and updating with your comment, I believe this should work:
// load csv
LOAD CSV FROM "file://values.txt" AS row
// create a collection of the first column turned into numeric values
WITH collect(toInt(row[0])) AS whitelist
// find the nodes
MATCH (node:GRAPH_OBJECT)
// for any of the properties of the node, if it's value is in our blacklist
WHERE NOT node.myprop IN whitelist)
// delete node and relationships
DETACH DELETE node;
The first WHERE clause in Michael's code (WHERE ANY(property in keys(node)) appears to just be there so each property on the node can be search, so if you only need to search myprop then this should not be needed.
This should work quite fast as I also use Index. Firstly, you can create an index on the property that you use as a reference to compare the nodes in the CSV.
In your case,
CREATE INDEX ON :GRAPH_OBJECT(myprop)
Then, you can do something like this to delete those node which are not present in the CSV but in your Database.
LOAD CSV FROM "file://values.csv" AS line
WITH collect(line.myprop) AS blacklist
//Assuming there is a header in the CSV with 'myprop' value which compare your existing database node with this property
MATCH (node:GRAPH_OBJECT)
WHERE EXISTS (node.myprop)
AND
NOT node.myprop IN blacklist
DETACH DELETE node;
That's it, You can add a PROFILE to see how well the query performs using Index