I'm trying to load some data into neo4j from csv files, and it seems a unique constraint error is triggered when it shouldn't. In particular, I created a contraint using
CREATE CONSTRAINT ON (node:`researcher`) ASSERT node.`id_patstats` IS UNIQUE;
Then, after inserting some data in neo4j, if I run (in neo4j browser)
MATCH (n:researcher {id_patstats: "2789"})
RETURN n
I get no results (no changes, no records), but if I run
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///home/manu/proyectos/PTL_RDIgraphs/rdigraphs/datamanager/tmp_patents/person906.csv' AS line
MERGE (n:researcher {`name` : line.`person_name`})
SET n.`id_patstats` = line.`person_id`;
I get
Neo.ClientError.Schema.ConstraintValidationFailed: Node(324016)
already exists with label researcher and property id_patstats =
'2789'
and the content of file person906.csv is
manu#cochi tmp_patents $cat person906.csv
person_id,person_name,doc_std_name,doc_std_name_id
2789,"li, jian",LI JIAN,2390
(this a minimum non working example extracted from a larger dataset; also, in the original "person906.csv" I made sure that "id_patstats" is really unique).
Any clue?
EDIT:
Still struggling with this...
If I run
MATCH (n)
WHERE EXISTS(n.id_patstats)
RETURN DISTINCT "node" as entity, n.id_patstats AS id_patstats
LIMIT 25
UNION ALL
MATCH ()-[r]-()
WHERE EXISTS(r.id_patstats)
RETURN DISTINCT "relationship" AS entity, r.id_patstats AS id_patstats
LIMIT 25
(clicking in the neo4j browser to get some examples of the id_patstats property) I get
(no changes, no records)
that is, id_patstats property is not set anywhere. Moreover
MATCH (n:researcher {`name` : "li, jian"})
SET n.`id_patstats` = XXX;
this will always trigger an error regardless of XXX, which (I guess) means the actual problem is that the name "li, jian" is already present. Although I didn't set any constraint on the name property, I'm guessing neo4j goes like this: you are trying to set a UNIQUE property on a node matching a property (name) that is not necessarily UNIQUE; hence that match could yield several nodes and I can't set the same UNIQUE property on all of them...so I won't even try
At least two of your researchers have the same name. You shouldn't MERGE by name and then add id as a property. You should MERGE by id and add the name as a property and it will work fine.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///home/manu/proyectos/PTL_RDIgraphs/rdigraphs/datamanager/tmp_patents/person906.csv' AS line
MERGE (n:researcher {`id_patstats`:line.`person_id`})
SET n.name`=line.`person_name`;
Related
i m new at neo4j and i d like to upload a csv file and create a set of nodes. However i have already some existing nodes that may exist on that csv file. Is there an option to load the csv, create the nodes based on each row and in case the node already exists skip that row?
Thanks
You can use the MERGE clause to avoid creating duplicate nodes and relationships.
However, you need to carefully read the documentation to understand how to use MERGE, as incorrect usage can cause the unintentional creation of nodes and relationships.
Merge will give you what you want, however you must be careful how you identify the record uniquely to prevent creating duplicates
I'll put the desired final form first as attention spans seem to be on the decline...
// This one is safe assuming name is a true unique identifier of your Friends
// and that their favorite colors and foods may change over time
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0]})
set a.favorite_food = line[1]
set a.favorite_color = line[2]
The merge above will create or find the Friend node with that matching name and then, regardless of whether we are creating it or updating it, set the attributes on it.
If we were to instead provide all the attributes in the merge as such:
// This one is dangerous - all attributes must match in order
// to find the existing Friend node
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0], favorite_food: line[1], favorite_color: line[2]})
Then we would fail to find an existing friend everytime their favorite_food or favorite_color was updated in our data being (re)loaded.
Here's an example for anyone who's imagination hasn't fully filled in the blanks...
//Last month's file contained:
Bob Marley,Hemp Seeds,Green
//This month's file contained:
Bob Marley,Soylent Green,Rainbow
I have a Node table and an Edge table, both available as CSV files.
I managed to load the Node table by:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///NodesETL.csv' AS line
CREATE (:InfoNodes {id: toString(line.id), description: toString(line.description)})
This query creates the InfoNodes with the field values of the CSV file as properties of :InfoNodes which is fine.
InfoNodes have relationships with other InfoNodes, e.g. these relationships exists between Nodes with the same label.
These relationships are stored in an Edge table available as an additional CSV file.
Every row of this Edge table holds idfrom and idto fields that defines the relationships between InfoNodes on basis of their id property.
The Edge table holds also 3 additional fields representing properties of the relationship. The firstproperty is always a string and never NULL e.g. never an empty string. The secondproperty and thirdproperty, both type string, can have NULL values like "". So secondproperty and/or thirdproperty can contain NULL values.
I try to use this Edge table to create the [:RELATIONSHIP {firstproperty:, secondproperty:, thirdproperty:}] relationships between (:InfoNodes) by:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///EdgesETL.csv' AS line
MATCH (from:InfoNodes{id: toString(line.idfrom)})
MATCH (to:InfoNodes{id: toString(line.idto)})
MERGE (from)-[:RELATION {firstproperty: toString(line.firstproperty), secondproperty: toString(line.secondproperty), thirdproperty: toString(line.thirdproperty)}]->(to)
This second Cypher script results into an error when secondproperty and thirdproperty in the Edge table contain NULL values.
The Neo4j error message is: Cannot merge relationship using null property value for secondproperty.
When I remove from the second script the secondproperty field and secondproperty: property than the same type of error occurs mentioning thirdproperty: Cannot merge relationship using null property value for thirdproperty
When I remove secondproperty and thirdproperty fields and properties from the previous script then the [:RELATIONS] relationships between InfoNodes are created including firstproperty table fields stored as firstproperty: property of the [:RELATION] relationship.
Question: How to extend the second script in order to load from the Edge table the second property and thirdproperty fields into secondproperty: and thirdproperty: of [:RELATION] relationships including NULL values?
Can't MERGE with null values; 'Cannot merge node using null property value' in neo4j describes the same problem but doesn't answer my question in case of multiple fields/properties with NULL values.
You'll want to re-review the MERGE section in the developer guide. Specifically, in the introduction, there's mention of ON CREATE and ON MATCH. This allows you to set properties in cases where the MERGE resulted in a creation, or when instead the MERGE matched upon existing elements.
Typically you will want to MERGE only properties that uniquely define the thing, like IDs, and set the rest of the properties within ON CREATE.
Your query after this change might look something like this:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///EdgesETL.csv' AS line
MATCH (from:InfoNodes{id: toString(line.idfrom)})
MATCH (to:InfoNodes{id: toString(line.idto)})
MERGE (from)-[r:RELATION {firstproperty: toString(line.firstproperty)}]->(to)
ON CREATE SET r.secondproperty = toString(line.secondproperty), r.thirdproperty = toString(line.thirdproperty)
I am trying to load large dataset into neo4j-3 and looking for the options. I found one neo4j-import but the problem with that is it is for initial load only. I have to load 2M records around every week.
I tried loading through shell but having some performance issue, I tried following.
1) Creating constraint upfront.
2) Creating Node and relationships in separate query.
3) Heap space 8G
4) dbms.memory.pagecache 4G
Many times the import just hangs and does nothing for hours.
Edit - CSV load being executed:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
OPTIONAL MATCH (per:Person {UID : "Person."+row.player_cardnum})
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;
EDIT
On a second look, seems like you're trying to implement some kind of conditional logic to your insert.
It looks like what you're trying to do is figure out if a :Person exists with a UID (derived from some concatenation with row.player_cardnum), and in the case where that :Person doesn't exist and the match fails, MERGE a :Person with the CardNumber given by row.player_cardnum.
If this is your goal, you're ALMOST there with your query. The problem is with your WHERE clause.
Understand that WHERE clauses are linked with a preceding MATCH, OPTIONAL MATCH, or WITH, and only affects the linked clause.
With that WHERE on that OPTIONAL MATCH, per will always be null, but more importantly, your row will still exist, and the following MERGE will ALWAYS take place for all rows in the CSV. This is probably the source of your slowdown, as it's creating new :Person nodes for all rows.
If you're trying to null out the row completely when the OPTIONAL MATCH hits on an existing :Person (so the MERGE won't happen in that case), you'll need to add a WITH clause, and make sure your WHERE clause is applied to it instead of the OPTIONAL MATCH.
Additionally, make sure that you have either unique constraints or indexes on Person.UID and Person.CardNumber. As for the UID match, I've heard that indexes are not used when there's some kind of string concatenation of the thing you're matching upon, so you may need to assemble it first and pass it in with a WITH.
Your final query would look like this:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM "file:///my_sds_39_joe.csv"
AS row
// first build the UID so we can take advantage of the index
WITH row, "Person." + row.player_cardnum AS UID
OPTIONAL MATCH (per:Person {UID : UID})
// the WHERE now applies to the WITH, which will filter out and null out the row when an OPTIONAL MATCH is found
WITH row, per
WHERE per IS NULL
MERGE (p:Person {CardNumber : row.player_cardnum})
ON CREATE SET p.Creation Date = timestamp(), p.Modification Date = timestamp() ;
My target is to create node + set new property to it in case not exists
if it's exist I just want to update it's property
Tried this:
MATCH (user:C9 {userId:'44'})
CREATE UNIQUE (user{timestamp:'1111'})
RETURN user
*in case the node with the property userId=44 already existed I just want to set it's property into 1111 else just create it and set it.
error I am getting:
user already declared (line 2, column 16 (offset: 46))
"CREATE UNIQUE (user{timestamp:'1111'})"
should I switch to Merge or?
thanks.
Yes, you should use the MERGE statement.
MERGE (user:C9 {userId:'44'})
// you can set some initial properties when the node is created if required
//ON CREATE SET user.propertykey = 'propertyvalue'
ON MATCH SET user.timestamp = '1111'
RETURN user
You mention unique constraints - I assume you have one set up. You definitely should do to prevent duplicate nodes being created. It will also create a schema index to improve the performance of your node lookup. If you do not yet have a unique constraint then it can be created like so
CREATE CONSTRAINT ON (u:C9) ASSERT u.userId IS UNIQUE
See the Neo4j MERGE documentation.
Finally, to understand what is going on in your query let's have a quick look line by line.
MATCH (user:C9 { userId:'44' })
This matches the node with label :C9 that has a userId property with value 44 and assigns it the identifier user.
CREATE UNIQUE (user{timestamp:'1111'})
This line is simply trying to create a new node with no label and a property timestamp with value '1111'. The exception you are seeing is a result of you using the same user identifier that has already been used in the first line. However, this is not a supported way to use CREATE UNIQUE as it requires a match first and will then create bits of the pattern that does not exist. The upside of this is that it is stopping this unwanted node (user{timestamp:'1111'}) being created in the graph.
RETURN user
This line is pretty self explanatory and is not being reached.
EDIT
There seems to be some confusion surrounding CREATE UNIQUE and when it should be used. This query
CREATE UNIQUE (user:C9 {timestamp:'1111'})
will fail with the message
This pattern is not supported for CREATE UNIQUE
To use CREATE UNIQUE you would first match an existing node and then use that to create a unique pattern in the graph. So to create a relationship from user to a second node you would use
MATCH (user:C9 { userId: '44' }
CREATE UNIQUE (user)-[r:FOO]-(bar)
RETURN r
If there is no relationships of type FOO from user then a new node will be created to represent bar and relationship of type :FOO will be created between them. Conversely, if the MATCH statement does not make a match then no nodes or relationships will be created.
I'm using Neo4j 2.0.0-M06. Just learning Cypher and reading the docs. In my mind this query would work, but I should be so lucky...
I'm importing tweets to a mysql-database, and from there importing them to neo4j. If a tweet is already existing in the Neo4j database, it should be updated.
My query:
MATCH (y:Tweet:Socialmedia) WHERE
HAS (y.tweet_id) AND y.tweet_id = '123'
CREATE UNIQUE (n:Tweet:Socialmedia {
body : 'This is a tweet', tweet_id : '123', tweet_userid : '321', tweet_username : 'example'
} )
Neo4j says: This pattern is not supported for CREATE UNIQUE
The database is currently empty on nodes with the matching labels, so there are no tweets what so ever in the Neo4j database.
What is the correct query?
You want to use MERGE for this query, along with a unique constraint.
CREATE CONSTRAINT on (t:Tweet) ASSERT t.tweet_id IS UNIQUE;
MERGE (t:Tweet {tweet_id:'123'})
ON CREATE
SET t:SocialMedia,
t.body = 'This is a tweet',
t.tweet_userid = '321',
t.tweet_username = 'example';
This will use an index to lookup the tweet by id, and do nothing if the tweet exists, otherwise it will set those properties.
I would like to point that one can use a combination of
CREATE CONSTRAINT and then a normal
CREATE (without UNIQUE)
This is for cases where one expects a unique node and wants to throw an exception if the node unexpectedly exists. (Far cheaper than looking for the node before creating it).
Also note that MERGE seems to take more CPU cycles than a CREATE. (It also takes more CPU cycles even if an exception is thrown)
An alternative scenario covering CREATE CONSTRAINT, CREATE and MERGE (though admittedly not the primary purpose of this post).