I have following query to create person node -
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "http://192.168.11.121/movie-reco-db/person_node.csv" as row
CREATE (:Person {personId: row.person_id, name: row.name});
I set index on personId , person_node.csv is the file I exported from MySql database , this query is working fine but the problem is that the CSV file will have new records on each time I export , If I run this query again then it is creating duplicate nodes , If I set UNIQUE index on personId then it is says -
Node 0 already exists with label Person and property "personId"=[1]
And does not insert new records. So is there any elegant way to update record if already exists or create new one if not.
You're looking for the MERGE operation, which will attempt a match, and if it doesn't find the thing, it will create it. Be aware that if the entirety of the thing you are merging does not exist (for example, merging a node with a personId and a name, but an existing node has that personId but a slightly different name) it will create the node.
If you have a unique ID for the node, merge on that, then use ON CREATE to add the remaining properties (ON CREATE only gets executed when MERGE causes a create instead of matching on existing entities in your db, there is another command ON MATCH that only gets executed when it matches instead of creates).
Your final query will look like:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "http://192.168.11.121/movie-reco-db/person_node.csv" as row
MERGE (p:Person {personId: row.person_id})
ON CREATE SET p.name = row.name;
Related
I'm looking to perform periodic refreshes of node data in my neo4j db. A good example of my needs would be company employees -- where if an employee is terminated, they are removed from the graph completely and new employees are added.
Really, deleting all nodes of this label and ingesting a fresh dataset likely suffices -- but it feels quite ugly. Is there a more elegant solution? My fresh data exists in csv and I want to pull it in daily.
You could put a 'last updated' timestamp on your nodes. Each day, do your update using MERGE. If the csv data exists on the database, update the timestamp using the ON MATCH clause of MERGE. If the csv data doesn't exist MERGE will create new nodes (make sure to add a timestamp property of some description). E.g:
MERGE (n:Person {<selection_filter>})
ON CREATE SET <required_properties>, n.lastUpdated = date()
ON MATCH SET
n.lastUpdated = date()
After updating the graph with csv data, run a query which deletes all nodes whose timestamps are before today's; i.e. haven't been updated.
You might find creating an index on lastUpdated will improve performance for the delete query.
If your CSV file is all active employees, then you can do something like this:
MATCH (e:Employee)
SET e.ActiveToday = False
LOAD CSV FROM "employees.csv" as line
MERGE (e:Employee {employeeID:line.employeeID})
SET e.ActiveToday = True
MATCH (e:Employee {ActiveToday: False})
DETACH DELETE e
The MERGE will create nodes for new employees in the file and match those that already exist. Both creates and matches will have their ActiveToday property updated. From there, you just match those where the property is still false and remove them.
I am trying to import CSV into Neo4j and create a list collection type property to node.
I have tried with the below code but it creates multiple nodes for the values in csvline.name.
LOAD CSV WITH HEADERS FROM "file:\\persons1.csv" AS csvLine
merge (p:Persons {id: toInteger(csvLine.id), name: [csvLine.name]})
CREATE (n:Person{name:'john',age:34,gender:'m', phone_no:[1234,5678]})
I am expecting only one node having property with collection of phone number should be created in the above case.
Since your CREATE clause is in the same Cypher statement as the LOAD CSV, it will be executed once for every csvLine value.
You will need to run the CREATE clause separately if you want it to be executed only once. (But you may still end up with 2 Person nodes with the name, "John", since a MERGE call may have already created one.)
I have a data set, which looks like this
Now, as one can see that a single person has multiple skillid, along with this data, i also have a skill_ref table, which has 2 columns(skillid and skillname) so from the image above, i can look and say that last person has multiple skills, Now, i want this data to be put in Neo4j, with person and skillname as node, and a relationship of has_skill. But i dont know how to handle the multiple instances, if i split the skillid , then i will have multiple instances of person name, but this is not what i want, i want something like this
in the graph,the center node is the name of the person and the others have skill name, with the arrows pointing a relation has_skill.
I am new to neo4j as well as cypher, any help guys will be highly appreciated.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.first_name,p.lastname=line.last_name
WITH line,split(line.skillid,' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
WITH line
MERGE(skill:line.skillid{name:line.skillname})
CREATE (p)-[:HAS_SKILL]->(skill)
You've got the right idea.
The general approach is to first MERGE (or CREATE) the :Person node, then split() the skillid into a list of skill ids, UNWIND the skill id list into rows, then MERGE the skill for the given id (and make sure you have an index or unique constraint on :Skill(id)), then MERGE (or CREATE) the relationship between the :Person node and the :Skill.
Here's an example, loading from a CSV file:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///your.file.url" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.f_name ... <same for the rest of the properties>
UNWIND split(line.skillid, ' ') as skillId
MERGE (skill:Skill{id:skillId})
CREATE (p)-[:HAS_SKILL]->(skill)
EDIT
Regarding the revised query you're attempting, it's actually better to use separate queries for each csv load, using the first to create the nodes and merge the relationships, and the second just to match/merge to skills and add the skill name:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
SET p.firstName = line.first_name, p.lastname=line.last_name
WITH line, split(line.skillid, ' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
CREATE (p)-[:HAS_SKILL]->(skill)
Then your next query:
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
MERGE (skill:Skill{id:line.skillid})
SET skill.name = line.skillname
Remember that you should have an a unique constraint created on :Skill(id) and :Person(id) first.
I am relatively new to neo4j.
I have imported dataset of 12 million records and I have created a relationship between two nodes. When I created the relationship, I forgot to attach a property to the relationship. Now I am trying to set the property for the relationship as follows.
LOAD CSV WITH HEADERS FROM 'file:///FileName.csv' AS row
MATCH (user:User{userID: USERID})
MATCH (order:Order{orderID: OrderId})
MATCH(user)-[acc:ORDERED]->(order)
SET acc.field1=field1,
acc.field2=field2;
But this query is taking too much time to execute,
I even tried USING index on user and order node.
MATCH (user:User{userID: USERID}) USING INDEX user:User(userID)
Isn't it possible to create new attributes for the relationship at a later point?
Please let me know, how can I do this operation in a quick and efficient way.
You also forgot to prefix your query with USING PERIODIC COMMIT,
your query will build up transaction state for 24 million changes (property updates) and won't have enough memory to keep all that state.
You also forgot row. for the data that comes from your CSV and those names are inconsistently spelled.
If you run this from neo4j browser pay attention to any YELLOW warning signs.
Run
CREATE CONSTRAINT ON (u:User) ASSERT u.userID IS UNIQUE;
Run
CREATE CONSTRAINT ON (o:Order) ASSERT o.orderID IS UNIQUE;
Run
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///FileName.csv' AS row
with row.USERID as userID, row.OrderId as orderID
MATCH (user:User{userID: userID})
USING INDEX user:User(userID)
MATCH (order:Order{orderID: orderID})
USING INDEX order:Order(orderID)
MATCH(user)-[acc:ORDERED]->(order)
SET acc.field1=row.field1, acc.field2=row.field2;
I have a set of CSV files with duplicate data, i.e. the same row might (and does) appear in multiple files. Each row is uniquely identified by one of the columns (id) and has quite a few other columns that indicate properties, as well as required relationships (i.e. ids of other nodes to link to). The files all have the same format.
My problem is that, due to size and number of the files, I want to avoid processing the rows that already exist - I know that as long as id is the same, the contents of the rows will be the same across the files.
Can any cypher wizard advise how to write a query that would create the node, set all the properties and create all the relationship if a node with given id does not exist, but skip the action altogether if such node is found? I tried with MERGE ON CREATE, something along the lines of:
LOAD CSV WITH HEADERS FROM "..." AS row
MERGE (f:MyLabel {id:row.uniqueId})
ON CREATE SET f....
WITH f,row
MATCH (otherNode:OtherLabel {id : row.otherNodeId})
MERGE (f) -[:REL1] -> (otherNode)
but unfortunately that can only be applied to not setting the properties again, but I couldn't work out how to skip the merging part of relationships (only shown one here, but there are quite a few more).
Thanks in advance!
You can just optionally match the node and then skip with WHERE n IS NULL
Make sure you have an index or constraint on :MyLabel(id)
LOAD CSV WITH HEADERS FROM "..." AS row
OPTIONAL MATCH (f:MyLabel {id:row.uniqueId})
WHERE f IS NULL
MERGE (f:MyLabel {id:row.uniqueId})
ON CREATE SET f....
WITH f,row
MATCH (otherNode:OtherLabel {id : row.otherNodeId})
MERGE (f) -[:REL1] -> (otherNode)