How to include properties with NULL values using Neo4j MERGE - neo4j

I have a Node table and an Edge table, both available as CSV files.
I managed to load the Node table by:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///NodesETL.csv' AS line
CREATE (:InfoNodes {id: toString(line.id), description: toString(line.description)})
This query creates the InfoNodes with the field values of the CSV file as properties of :InfoNodes which is fine.
InfoNodes have relationships with other InfoNodes, e.g. these relationships exists between Nodes with the same label.
These relationships are stored in an Edge table available as an additional CSV file.
Every row of this Edge table holds idfrom and idto fields that defines the relationships between InfoNodes on basis of their id property.
The Edge table holds also 3 additional fields representing properties of the relationship. The firstproperty is always a string and never NULL e.g. never an empty string. The secondproperty and thirdproperty, both type string, can have NULL values like "". So secondproperty and/or thirdproperty can contain NULL values.
I try to use this Edge table to create the [:RELATIONSHIP {firstproperty:, secondproperty:, thirdproperty:}] relationships between (:InfoNodes) by:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///EdgesETL.csv' AS line
MATCH (from:InfoNodes{id: toString(line.idfrom)})
MATCH (to:InfoNodes{id: toString(line.idto)})
MERGE (from)-[:RELATION {firstproperty: toString(line.firstproperty), secondproperty: toString(line.secondproperty), thirdproperty: toString(line.thirdproperty)}]->(to)
This second Cypher script results into an error when secondproperty and thirdproperty in the Edge table contain NULL values.
The Neo4j error message is: Cannot merge relationship using null property value for secondproperty.
When I remove from the second script the secondproperty field and secondproperty: property than the same type of error occurs mentioning thirdproperty: Cannot merge relationship using null property value for thirdproperty
When I remove secondproperty and thirdproperty fields and properties from the previous script then the [:RELATIONS] relationships between InfoNodes are created including firstproperty table fields stored as firstproperty: property of the [:RELATION] relationship.
Question: How to extend the second script in order to load from the Edge table the second property and thirdproperty fields into secondproperty: and thirdproperty: of [:RELATION] relationships including NULL values?
Can't MERGE with null values; 'Cannot merge node using null property value' in neo4j describes the same problem but doesn't answer my question in case of multiple fields/properties with NULL values.

You'll want to re-review the MERGE section in the developer guide. Specifically, in the introduction, there's mention of ON CREATE and ON MATCH. This allows you to set properties in cases where the MERGE resulted in a creation, or when instead the MERGE matched upon existing elements.
Typically you will want to MERGE only properties that uniquely define the thing, like IDs, and set the rest of the properties within ON CREATE.
Your query after this change might look something like this:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///EdgesETL.csv' AS line
MATCH (from:InfoNodes{id: toString(line.idfrom)})
MATCH (to:InfoNodes{id: toString(line.idto)})
MERGE (from)-[r:RELATION {firstproperty: toString(line.firstproperty)}]->(to)
ON CREATE SET r.secondproperty = toString(line.secondproperty), r.thirdproperty = toString(line.thirdproperty)

Related

Skip existing node while loading csv

i m new at neo4j and i d like to upload a csv file and create a set of nodes. However i have already some existing nodes that may exist on that csv file. Is there an option to load the csv, create the nodes based on each row and in case the node already exists skip that row?
Thanks
You can use the MERGE clause to avoid creating duplicate nodes and relationships.
However, you need to carefully read the documentation to understand how to use MERGE, as incorrect usage can cause the unintentional creation of nodes and relationships.
Merge will give you what you want, however you must be careful how you identify the record uniquely to prevent creating duplicates
I'll put the desired final form first as attention spans seem to be on the decline...
// This one is safe assuming name is a true unique identifier of your Friends
// and that their favorite colors and foods may change over time
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0]})
set a.favorite_food = line[1]
set a.favorite_color = line[2]
The merge above will create or find the Friend node with that matching name and then, regardless of whether we are creating it or updating it, set the attributes on it.
If we were to instead provide all the attributes in the merge as such:
// This one is dangerous - all attributes must match in order
// to find the existing Friend node
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0], favorite_food: line[1], favorite_color: line[2]})
Then we would fail to find an existing friend everytime their favorite_food or favorite_color was updated in our data being (re)loaded.
Here's an example for anyone who's imagination hasn't fully filled in the blanks...
//Last month's file contained:
Bob Marley,Hemp Seeds,Green
//This month's file contained:
Bob Marley,Soylent Green,Rainbow

unexpected failure on unique constraint in neo4j

I'm trying to load some data into neo4j from csv files, and it seems a unique constraint error is triggered when it shouldn't. In particular, I created a contraint using
CREATE CONSTRAINT ON (node:`researcher`) ASSERT node.`id_patstats` IS UNIQUE;
Then, after inserting some data in neo4j, if I run (in neo4j browser)
MATCH (n:researcher {id_patstats: "2789"})
RETURN n
I get no results (no changes, no records), but if I run
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///home/manu/proyectos/PTL_RDIgraphs/rdigraphs/datamanager/tmp_patents/person906.csv' AS line
MERGE (n:researcher {`name` : line.`person_name`})
SET n.`id_patstats` = line.`person_id`;
I get
Neo.ClientError.Schema.ConstraintValidationFailed: Node(324016)
already exists with label researcher and property id_patstats =
'2789'
and the content of file person906.csv is
manu#cochi tmp_patents $cat person906.csv
person_id,person_name,doc_std_name,doc_std_name_id
2789,"li, jian",LI JIAN,2390
(this a minimum non working example extracted from a larger dataset; also, in the original "person906.csv" I made sure that "id_patstats" is really unique).
Any clue?
EDIT:
Still struggling with this...
If I run
MATCH (n)
WHERE EXISTS(n.id_patstats)
RETURN DISTINCT "node" as entity, n.id_patstats AS id_patstats
LIMIT 25
UNION ALL
MATCH ()-[r]-()
WHERE EXISTS(r.id_patstats)
RETURN DISTINCT "relationship" AS entity, r.id_patstats AS id_patstats
LIMIT 25
(clicking in the neo4j browser to get some examples of the id_patstats property) I get
(no changes, no records)
that is, id_patstats property is not set anywhere. Moreover
MATCH (n:researcher {`name` : "li, jian"})
SET n.`id_patstats` = XXX;
this will always trigger an error regardless of XXX, which (I guess) means the actual problem is that the name "li, jian" is already present. Although I didn't set any constraint on the name property, I'm guessing neo4j goes like this: you are trying to set a UNIQUE property on a node matching a property (name) that is not necessarily UNIQUE; hence that match could yield several nodes and I can't set the same UNIQUE property on all of them...so I won't even try
At least two of your researchers have the same name. You shouldn't MERGE by name and then add id as a property. You should MERGE by id and add the name as a property and it will work fine.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///home/manu/proyectos/PTL_RDIgraphs/rdigraphs/datamanager/tmp_patents/person906.csv' AS line
MERGE (n:researcher {`id_patstats`:line.`person_id`})
SET n.name`=line.`person_name`;

How to refer/make use of dynamic id generated by Neo4j?

Here is my model:
(domain)-[:has]-(data_file)-[:contains]-(entities)-[:have]-(columns)-[:have]-(datatypes)
Code to import data
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from "file:///test.csv" as row
MERGE(domain:Domain{name:row.Domain})
MERGE(data_file:SourceFile{name:row.data_File})
MERGE(entity:Entity{name:row.Entity})
MERGE(column:ColumnName{name:row.Column_Name, Data_Type: row.Data_Type})
MERGE (domain)-[:HAS]->(data_file)
MERGE (data_file)-[:HAS]->(entity)
MERGE (entity)-[:HAS]->(column)
The problem I have now is that when two different entity nodes have same Column_Name but with different data types, both entity nodes points to a single node (matched by its Column_Name)
e.g
1. data_file-A has Entity-A, Column_Name: user_id (Data_Type: String)
2. data_file-B has Entity-B, Column_Name: user_id (Data_Type: Boolean)
When I run the import above, I expected that (1.) above should have an edge to node user_id whose data type is String, and (2.) should have another edge to another node user_id whose data type is Boolean. But this is not happening. Both point to a single node whose name is user_id
How can I resolve this? Is there a way to refer to the dynamic id created by neo4j while creating edges, so that I can create edges between those dynamic ids?
Something like MERGE (entity.id)-[:HAS]->(column.id)

Neo4J Delete Nodes With Field Value Not in CSV with Cypher

Hello I want to delete all nodes with the label GRAPH_OBJECT that have a property value (lets call it myprop) that is not in a list of numeric values that I have in a CSV or text file.
How do a I accomplish this with Cypher?
This should work.
// load csv
LOAD CSV FROM "file://values.txt" AS row
// create a collection of the first column turned into numeric values
WITH collect(toInt(row[0])) AS blacklist
// find the nodes
MATCH (node:GRAPH_OBJECT)
// for any of the properties of the node, if it's value is in our blacklist
WHERE ANY(property in keys(node) WHERE node[property] IN blacklist)
// delete node and relationships
DETACH DELETE node;
Starting with Michael Hunger's code and updating with your comment, I believe this should work:
// load csv
LOAD CSV FROM "file://values.txt" AS row
// create a collection of the first column turned into numeric values
WITH collect(toInt(row[0])) AS whitelist
// find the nodes
MATCH (node:GRAPH_OBJECT)
// for any of the properties of the node, if it's value is in our blacklist
WHERE NOT node.myprop IN whitelist)
// delete node and relationships
DETACH DELETE node;
The first WHERE clause in Michael's code (WHERE ANY(property in keys(node)) appears to just be there so each property on the node can be search, so if you only need to search myprop then this should not be needed.
This should work quite fast as I also use Index. Firstly, you can create an index on the property that you use as a reference to compare the nodes in the CSV.
In your case,
CREATE INDEX ON :GRAPH_OBJECT(myprop)
Then, you can do something like this to delete those node which are not present in the CSV but in your Database.
LOAD CSV FROM "file://values.csv" AS line
WITH collect(line.myprop) AS blacklist
//Assuming there is a header in the CSV with 'myprop' value which compare your existing database node with this property
MATCH (node:GRAPH_OBJECT)
WHERE EXISTS (node.myprop)
AND
NOT node.myprop IN blacklist
DETACH DELETE node;
That's it, You can add a PROFILE to see how well the query performs using Index

Show Relations from Node to Node Neo4j

I'm very new to Neo4j, been playing around with it for a couple of days now.
I'm trying to use Neo4j to map our company's database by showing how one table is related to another (data is pulled to or pushed from one table to another) and what scripts are used to do this pulling and pushing. To do this, I'm using three different properties: TableName, ScriptName, and TableTouch.
TableName: Table node which corresponds to the name of a table
ScriptName: Script Node which corresponds to the script which
updatesa table
TableTouch: Used to show which table affects another
table
Here is an example of the .CSV I'm importing:
TableName ScriptName TableTouch
Source ScriptA Water/Oil
Water ScriptB Source
Oil ScriptC Source
Here is the code I have thus far:
CREATE CONSTRAINT ON (c:Table) ASSERT c.TableName IS UNIQUE;
CREATE CONSTRAINT ON (c:Scripts) ASSERT c.ScriptName IS UNIQUE;
LOAD CSV WITH HEADERS FROM
"file:///C:\\NeoTest.CSV" AS line
MERGE (table:Table {TableName: UPPER(line.TableName)})
SET table.TableTouch = UPPER(line.TableTouch)
MERGE (script:Scripts {ScriptName: UPPER(line.ScriptName)})
MERGE (table) - [:UPDATED_BY] -> (script)
This will relate scripts to their appropriate tables and load in all the table and script nodes.
Now an example of what I need is for Node "Source" to connect to Node "Water" because "Source" = Water.TableTouch and "Water" = Source.TableTouch.
Assume any given table could have multiple tables listed in the TableTouch property.
I want the TableName nodes to connect to other TableName nodes where the TableName of one node is found in the TableName.TableTouch of another node. How would I go about doing this? Do I need to have my .CSV formatted differently for this?
Thanks,
-Andrew
Edit: This may make things more clear
What I have:
What I'd like to have (red arrows):
[UPDATED]
If I understand your scenario, you want to represent the Script that is used to generate each Table, and what other table was used by that Script.
And, if I understand the meaning of your CSV file and your pictures, it looks like the Source table is generated by ScriptA without using data from any other tables. If so, you can create your CSV file to look something like this (where the Source table row's TableTouch column has the special value NOTHING -- you can use some other name -- to indicate that column actually has no value):
TableName,ScriptName,TableTouch
Source,ScriptA,NOTHING
Water,ScriptB,Source
Oil,ScriptC,Source
Data model:
(src:Table {name: 'Source'})<-[:USES]-(s:Script {name: 'ScriptC'})-[:MAKES]->(dest:Table {name: 'Oil'})
Note: This data model allows a single Script to "use" any number of source Tables and "make" any number of destination Tables.
Create Constraints
CREATE CONSTRAINT ON (t:Table) ASSERT t.name IS UNIQUE;
CREATE CONSTRAINT ON (s:Script) ASSERT s.name IS UNIQUE;
Import data
LOAD CSV WITH HEADERS FROM "file:///C:\\NeoTest.CSV" AS line
MERGE (src:Table {name: line.TableTouch})
MERGE (dest:Table {name: line.TableName})
MERGE (script:Script {name: line.ScriptName})
MERGE (script)-[:USES]->(src)
MERGE (script)-[:MAKES]->(dest)
Note: To keep the query simple, we just go ahead and create (at most) one NOTHING node to represent the absence of a source Table.
Results

Resources