I'm using cypher and the neo4j browser to create nodes from csv input.
I want to read in each row of my csv file with headers and then create a node with that row as properties.
MY current code is:
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS ROW
WITH ROW
CREATE (n:node $ROW)
This throws an error saying parameter missing.
Try this
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS row
CREATE (n:node)
SET n+= row
In Cypher, variables that start with "$" must be passed to the query as parameters. Your Cypher code is locally binding values to the ROW variable (and not passing a parameter), so change $ROW to ROW.
In addition, if you want to make sure that you do not generate duplicate nodes, you should consider using MERGE instead of CREATE. But before you do so, you must carefully read the documentation on MERGE to understand how to use it properly.
Related
Let's say initially create Order nodes through the csv file orders.csv
// Create orders
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
ON CREATE SET order.shipName = row.ShipName
Later I added more columns to the orders.csv, and I suppose I can add new properties into the graph this way:
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
ON CREATE SET order.shipName = row.ShipName, order.customerId = row.CustomerID, order.employeeID = row.EmployeeID;
Here two new properties 'customerId' and 'employeeId' to be added to each node of Order. I tested this command, but it doesn't change the graph at all. Does merge function incrementally add into to the graph?
MERGE works on exactly the expression you provide it, so
MERGE (order:Order {orderID: row.OrderID})
will check for a node with the label Order and an orderID property set to the value (and type) of row.orderID. If this doesn't exist exactly, it will be created.
Because you are using ON CREATE... that line will only occur if the node is being created by the merge, not if it is simply found (matched).
You probably want to look at using ON MATCH... instead - https://neo4j.com/docs/cypher-manual/current/clauses/merge/#query-merge-on-create-on-match
ON CREATE is only used by MERGE when it needs to create something.
On the other hand, ON MATCH is used by MERGE when it does not need to create anything.
So, your new query should look like this (assuming that you added no new rows to the CSV file, but only columns):
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
ON MATCH SET order.customerId = row.CustomerID, order.employeeID = row.EmployeeID;
I'm migrating data from RDBMS to Neo4j. I'm using 'neo4j-admin import' tool for bulk import by loading a csv dump. In order to accomodate live updates, I'm again getting the csv dump from RDBMS. Now, I'm using 'apoc.load.csv'.
I've my data in one file, say 'upd_product.csv'
And I've the headers in 'product_h.csv'
Now I want to use apoc.load.csv
Call apoc.load.csv('/upd_product.csv') yield map, list
Match (p: Product {id: line[0]})
Set p = map
In order to have this map, I need to specify the headers, and there exists no documentation on how to do that.
Please help me in this context.
Thanks in advance.
Just prepend the contents of product_h.csv (presumably a single line) to the beginning of upd_product.csv. Afterwards, you can just do this (assuming the header value for the first column is id):
CALL apoc.load.csv('/upd_product.csv') YIELD map
MATCH (p:Product {id: map.id})
SET p = map
I have two files. First file contain list of users with certain properties. I have loaded them in Neo4j as below:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///users.csv" AS row
CREATE (U:User{userid:row.userid, username:row.username})
Now, I have a second file that contains pincodes of the places the user stays or ever stayed at. Example:
User Pincodes
A 001
B 002
A 003
I want to add a property to the label User such that it adds all the pincodes as a list. But when I am using the below query, it only stores the most latest value and not all the values as a list.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///user_pincode.csv" AS line
MATCH (U:User)
WHERE U.userid=line.userid
SET U.pincode=[line.pincode]
Any suggestions would be really helpful.
[UPDATED]
You can do this:
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///user_pincode.csv" AS line
MATCH (u:User)
WHERE u.userid=line[0]
SET u.pincode = COALESCE(u.pincode, []) + line[1]
Since your CSV data has no header, this query omits the WITH HEADERS option, and treats line as an array. It appends the new pincode to the end of the existing pincode list (or, if the pincode property did not already exist, initialize that property with a single-element list). The COALESCE function returns the first argument that is non-NULL.
Hello I want to delete all nodes with the label GRAPH_OBJECT that have a property value (lets call it myprop) that is not in a list of numeric values that I have in a CSV or text file.
How do a I accomplish this with Cypher?
This should work.
// load csv
LOAD CSV FROM "file://values.txt" AS row
// create a collection of the first column turned into numeric values
WITH collect(toInt(row[0])) AS blacklist
// find the nodes
MATCH (node:GRAPH_OBJECT)
// for any of the properties of the node, if it's value is in our blacklist
WHERE ANY(property in keys(node) WHERE node[property] IN blacklist)
// delete node and relationships
DETACH DELETE node;
Starting with Michael Hunger's code and updating with your comment, I believe this should work:
// load csv
LOAD CSV FROM "file://values.txt" AS row
// create a collection of the first column turned into numeric values
WITH collect(toInt(row[0])) AS whitelist
// find the nodes
MATCH (node:GRAPH_OBJECT)
// for any of the properties of the node, if it's value is in our blacklist
WHERE NOT node.myprop IN whitelist)
// delete node and relationships
DETACH DELETE node;
The first WHERE clause in Michael's code (WHERE ANY(property in keys(node)) appears to just be there so each property on the node can be search, so if you only need to search myprop then this should not be needed.
This should work quite fast as I also use Index. Firstly, you can create an index on the property that you use as a reference to compare the nodes in the CSV.
In your case,
CREATE INDEX ON :GRAPH_OBJECT(myprop)
Then, you can do something like this to delete those node which are not present in the CSV but in your Database.
LOAD CSV FROM "file://values.csv" AS line
WITH collect(line.myprop) AS blacklist
//Assuming there is a header in the CSV with 'myprop' value which compare your existing database node with this property
MATCH (node:GRAPH_OBJECT)
WHERE EXISTS (node.myprop)
AND
NOT node.myprop IN blacklist
DETACH DELETE node;
That's it, You can add a PROFILE to see how well the query performs using Index
Assume a Node "Properties". I am using "LOAD CSV with headers..."
Following is the sample file format:
fields
a=100,b=110,c=120,d=500
How do I convert fields column to having a node with a,b,c,d and 100,110,120,500 respectively as the properties of the node "Properties"?
LOAD CSV WITH HEADERS FROM 'file:/sample.tsv' AS row FIELDTERMINATOR '\t'
CREATE (:Properties {props: row.fields})
The above does not create individual properties, but sets a string value to props as "a=100,b=110,c=120,d=500"
Also, different rows could have different set of Key values. That is the key needs to be dynamic. (There are other columns as well, I trimmed it for SO)
fields
a=100,b=110,c=120,d=500
X=300,y=210,Z=420,P=600
...
I am looking for a way to not split this key-value as columns and then load. The reason is they are dynamic - today it is a,b,c,d it may change to aa,bb,cc,dd etc.
I don't want to keep on changing my loader script to recognize new column headers.
Any pointers to solve this? I am using the latest 3.0.1 neo4j version.
First things first: Your file format currently defines a single header/property: fields:
fields
a=100,b=110,c=120,d=500
Since you defined a tab as field terminator, that entire string (a=100,b=110,c=120,d=500) would end up in your node's props property:
To have properties loaded dynamically: First set up proper header:
"a","b","x","y"
1,2,,
,,3,4
Then you can query with something like this:
LOAD CSV WITH HEADERS FROM 'file:///Users/David/overflow.csv' AS row
CREATE (:StackOverflow { a:row.a, b:row.b,x:row.x,y:row.y})
Then when you run something like:
match(so:StackOverflow) return so
You'll get the variable properties you wanted: