Neo4J Delete Nodes With Field Value Not in CSV with Cypher - neo4j

Hello I want to delete all nodes with the label GRAPH_OBJECT that have a property value (lets call it myprop) that is not in a list of numeric values that I have in a CSV or text file.
How do a I accomplish this with Cypher?

This should work.
// load csv
LOAD CSV FROM "file://values.txt" AS row
// create a collection of the first column turned into numeric values
WITH collect(toInt(row[0])) AS blacklist
// find the nodes
MATCH (node:GRAPH_OBJECT)
// for any of the properties of the node, if it's value is in our blacklist
WHERE ANY(property in keys(node) WHERE node[property] IN blacklist)
// delete node and relationships
DETACH DELETE node;

Starting with Michael Hunger's code and updating with your comment, I believe this should work:
// load csv
LOAD CSV FROM "file://values.txt" AS row
// create a collection of the first column turned into numeric values
WITH collect(toInt(row[0])) AS whitelist
// find the nodes
MATCH (node:GRAPH_OBJECT)
// for any of the properties of the node, if it's value is in our blacklist
WHERE NOT node.myprop IN whitelist)
// delete node and relationships
DETACH DELETE node;
The first WHERE clause in Michael's code (WHERE ANY(property in keys(node)) appears to just be there so each property on the node can be search, so if you only need to search myprop then this should not be needed.

This should work quite fast as I also use Index. Firstly, you can create an index on the property that you use as a reference to compare the nodes in the CSV.
In your case,
CREATE INDEX ON :GRAPH_OBJECT(myprop)
Then, you can do something like this to delete those node which are not present in the CSV but in your Database.
LOAD CSV FROM "file://values.csv" AS line
WITH collect(line.myprop) AS blacklist
//Assuming there is a header in the CSV with 'myprop' value which compare your existing database node with this property
MATCH (node:GRAPH_OBJECT)
WHERE EXISTS (node.myprop)
AND
NOT node.myprop IN blacklist
DETACH DELETE node;
That's it, You can add a PROFILE to see how well the query performs using Index

Related

Neo4J CSV into Props

I'm using cypher and the neo4j browser to create nodes from csv input.
I want to read in each row of my csv file with headers and then create a node with that row as properties.
MY current code is:
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS ROW
WITH ROW
CREATE (n:node $ROW)
This throws an error saying parameter missing.
Try this
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS row
CREATE (n:node)
SET n+= row
In Cypher, variables that start with "$" must be passed to the query as parameters. Your Cypher code is locally binding values to the ROW variable (and not passing a parameter), so change $ROW to ROW.
In addition, if you want to make sure that you do not generate duplicate nodes, you should consider using MERGE instead of CREATE. But before you do so, you must carefully read the documentation on MERGE to understand how to use it properly.

Skip existing node while loading csv

i m new at neo4j and i d like to upload a csv file and create a set of nodes. However i have already some existing nodes that may exist on that csv file. Is there an option to load the csv, create the nodes based on each row and in case the node already exists skip that row?
Thanks
You can use the MERGE clause to avoid creating duplicate nodes and relationships.
However, you need to carefully read the documentation to understand how to use MERGE, as incorrect usage can cause the unintentional creation of nodes and relationships.
Merge will give you what you want, however you must be careful how you identify the record uniquely to prevent creating duplicates
I'll put the desired final form first as attention spans seem to be on the decline...
// This one is safe assuming name is a true unique identifier of your Friends
// and that their favorite colors and foods may change over time
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0]})
set a.favorite_food = line[1]
set a.favorite_color = line[2]
The merge above will create or find the Friend node with that matching name and then, regardless of whether we are creating it or updating it, set the attributes on it.
If we were to instead provide all the attributes in the merge as such:
// This one is dangerous - all attributes must match in order
// to find the existing Friend node
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0], favorite_food: line[1], favorite_color: line[2]})
Then we would fail to find an existing friend everytime their favorite_food or favorite_color was updated in our data being (re)loaded.
Here's an example for anyone who's imagination hasn't fully filled in the blanks...
//Last month's file contained:
Bob Marley,Hemp Seeds,Green
//This month's file contained:
Bob Marley,Soylent Green,Rainbow

Unable to create a list of properties for each node

I have two files. First file contain list of users with certain properties. I have loaded them in Neo4j as below:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///users.csv" AS row
CREATE (U:User{userid:row.userid, username:row.username})
Now, I have a second file that contains pincodes of the places the user stays or ever stayed at. Example:
User Pincodes
A 001
B 002
A 003
I want to add a property to the label User such that it adds all the pincodes as a list. But when I am using the below query, it only stores the most latest value and not all the values as a list.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///user_pincode.csv" AS line
MATCH (U:User)
WHERE U.userid=line.userid
SET U.pincode=[line.pincode]
Any suggestions would be really helpful.
[UPDATED]
You can do this:
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///user_pincode.csv" AS line
MATCH (u:User)
WHERE u.userid=line[0]
SET u.pincode = COALESCE(u.pincode, []) + line[1]
Since your CSV data has no header, this query omits the WITH HEADERS option, and treats line as an array. It appends the new pincode to the end of the existing pincode list (or, if the pincode property did not already exist, initialize that property with a single-element list). The COALESCE function returns the first argument that is non-NULL.

Neo4j display only one node for 1 to many relationship

i'm trying to solve a problem of the 1: many relationship display in neo4j. My dataset is as below
child,desc,type,parent
1,PGD,Exchange,0
2,MSE 1,MSE,1
3,MSE 2,MSE,1
4,MSE 3,MSE,1
5,MSE 4,MSE,1
6,BRAS 1,BRAS,2
6,BRAS 1,BRAS,3
7,BRAS 2,BRAS,4
7,BRAS 2,BRAS,5
10,NPE 1,NPE,6
11,NPE 2,NPE,7
12,OLT,OLT,10
12,OLT,OLT,11
13,FDC,FDC,12
14,FDP,FDP,13
15,Cust 1,Customer,14
16,Cust 2,Customer,14
17,Cust 3,Customer,14
LOAD CSV WITH HEADERS FROM 'file:///FTTH_sample.csv' AS line
CREATE(:ftthsample
{child_id:line.child,
desc:line.desc,
type:line.type,
parent_id:line.parent});
//Relations
match (child:ftthsample),(parent:ftthsample)
where child.child_id=parent.parent_id
create (child)-[:test]->(parent)
//Query:
MATCH (child)-[childrel:test*]-(elem)-[parentrel:test*]->(parent)
WHERE elem.desc='FDP'
RETURN child,childrel,elem,parentrel
It returns a display as below.
I want the duplicate nodes to be displayed as one. Newbie with Neo4J. Can anyone of the experts help please?
This seems like an error in your graph creation query. You have a few lines in your query specifying the same node multiple times, but with multiple parents:
6,BRAS 1,BRAS,2
6,BRAS 1,BRAS,3
I'm guessing you actually want this to be a single node, with parent relationships to nodes with the given parent ids, instead of two separate nodes.
Let's adjust your import query. Instead of using a CREATE on each line, we'll use MERGE, and just on the child_id, which seems to be your primary key (maybe consider just using id instead, as a node can have an id on its own, without having to consider the context of whether it's a parent or child). We can use the ON CREATE clause after MERGE to add in the remaining properties only if the MERGE resulted in node creation (instead of matching to an existing node.
That will ensure we only have one node created per child_id.
Rather than having to rematch the child, we can use the child node we just created, match on the parent, and create the relationship.
LOAD CSV WITH HEADERS FROM 'file:///FTTH_sample.csv' AS line
MERGE(child:ftthsample {child_id:line.child})
ON CREATE SET
child.desc = line.desc,
child.type = line.type
WITH child, line.parent as parentId
MATCH (parent:ftthsample)
WHERE parent.child_id = parentId
MERGE (child)-[:test]->(parent)
Note that we haven't added line.parent as a property. It's not needed, since we only use that to create relationships, and after the relationships are there, we won't need those again.

How to include properties with NULL values using Neo4j MERGE

I have a Node table and an Edge table, both available as CSV files.
I managed to load the Node table by:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///NodesETL.csv' AS line
CREATE (:InfoNodes {id: toString(line.id), description: toString(line.description)})
This query creates the InfoNodes with the field values of the CSV file as properties of :InfoNodes which is fine.
InfoNodes have relationships with other InfoNodes, e.g. these relationships exists between Nodes with the same label.
These relationships are stored in an Edge table available as an additional CSV file.
Every row of this Edge table holds idfrom and idto fields that defines the relationships between InfoNodes on basis of their id property.
The Edge table holds also 3 additional fields representing properties of the relationship. The firstproperty is always a string and never NULL e.g. never an empty string. The secondproperty and thirdproperty, both type string, can have NULL values like "". So secondproperty and/or thirdproperty can contain NULL values.
I try to use this Edge table to create the [:RELATIONSHIP {firstproperty:, secondproperty:, thirdproperty:}] relationships between (:InfoNodes) by:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///EdgesETL.csv' AS line
MATCH (from:InfoNodes{id: toString(line.idfrom)})
MATCH (to:InfoNodes{id: toString(line.idto)})
MERGE (from)-[:RELATION {firstproperty: toString(line.firstproperty), secondproperty: toString(line.secondproperty), thirdproperty: toString(line.thirdproperty)}]->(to)
This second Cypher script results into an error when secondproperty and thirdproperty in the Edge table contain NULL values.
The Neo4j error message is: Cannot merge relationship using null property value for secondproperty.
When I remove from the second script the secondproperty field and secondproperty: property than the same type of error occurs mentioning thirdproperty: Cannot merge relationship using null property value for thirdproperty
When I remove secondproperty and thirdproperty fields and properties from the previous script then the [:RELATIONS] relationships between InfoNodes are created including firstproperty table fields stored as firstproperty: property of the [:RELATION] relationship.
Question: How to extend the second script in order to load from the Edge table the second property and thirdproperty fields into secondproperty: and thirdproperty: of [:RELATION] relationships including NULL values?
Can't MERGE with null values; 'Cannot merge node using null property value' in neo4j describes the same problem but doesn't answer my question in case of multiple fields/properties with NULL values.
You'll want to re-review the MERGE section in the developer guide. Specifically, in the introduction, there's mention of ON CREATE and ON MATCH. This allows you to set properties in cases where the MERGE resulted in a creation, or when instead the MERGE matched upon existing elements.
Typically you will want to MERGE only properties that uniquely define the thing, like IDs, and set the rest of the properties within ON CREATE.
Your query after this change might look something like this:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///EdgesETL.csv' AS line
MATCH (from:InfoNodes{id: toString(line.idfrom)})
MATCH (to:InfoNodes{id: toString(line.idto)})
MERGE (from)-[r:RELATION {firstproperty: toString(line.firstproperty)}]->(to)
ON CREATE SET r.secondproperty = toString(line.secondproperty), r.thirdproperty = toString(line.thirdproperty)

Resources