I already have some data in Neo4j, data is modelled in the below fashion :
:A {ID:"123",Group:"ABC",Family:"XYZ"}
:B {ID:"456",Group:"ABC",Family:"XYZ"})
(:A)-[:SCORE{score:'2'}]-(:B)
Please find the attached image for more clarification how data looks like currently.
Now,
I am importing some new data through CSV file which has 5 columns
A's ID
B's ID
Score through which A is attached to B
Group
Family
In the new data there can be some new A Ids or some new B Ids
Question :
I want to create those new nodes of type A and B and create a relationship 'Score' and assigning score as the value of relationship type 'Score' between them
there are chances that there already existing scores between A and B might have changed. So i want to just update the previous score with the new one.
How to write cypher to achieve the above problem using CSV as import.
I used the below cypher query to model data for the first time:
using periodic commit LOAD CSV WITH HEADERS FROM "file:///ABC.csv" as line Merge(a:A{ID: line.A,Group:line.Group,Family:line.Family})
Merge(b:B{ID: line.A,Group:line.Group,Family:line.Family})
Merge(a)-[:Score{score:toFloat(line.Score)}]-(b)
Note: Family and Group are same for both type of nodes 'A' and 'B'
Thanks in advance.
You can MERGE the relationship and set the score after the fact so it does not create new SCORE relationships for every new value.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///ABC.csv" AS line
MERGE (a:A {ID: line.A, Group:line.Group, Family:line.Family})
MERGE (b:B {ID: line.A, Group:line.Group, Family:line.Family})
MERGE (a)-[score:SCORE]-(b)
SET score.score = toFloat(line.Score)
Related
I'm looking to perform periodic refreshes of node data in my neo4j db. A good example of my needs would be company employees -- where if an employee is terminated, they are removed from the graph completely and new employees are added.
Really, deleting all nodes of this label and ingesting a fresh dataset likely suffices -- but it feels quite ugly. Is there a more elegant solution? My fresh data exists in csv and I want to pull it in daily.
You could put a 'last updated' timestamp on your nodes. Each day, do your update using MERGE. If the csv data exists on the database, update the timestamp using the ON MATCH clause of MERGE. If the csv data doesn't exist MERGE will create new nodes (make sure to add a timestamp property of some description). E.g:
MERGE (n:Person {<selection_filter>})
ON CREATE SET <required_properties>, n.lastUpdated = date()
ON MATCH SET
n.lastUpdated = date()
After updating the graph with csv data, run a query which deletes all nodes whose timestamps are before today's; i.e. haven't been updated.
You might find creating an index on lastUpdated will improve performance for the delete query.
If your CSV file is all active employees, then you can do something like this:
MATCH (e:Employee)
SET e.ActiveToday = False
LOAD CSV FROM "employees.csv" as line
MERGE (e:Employee {employeeID:line.employeeID})
SET e.ActiveToday = True
MATCH (e:Employee {ActiveToday: False})
DETACH DELETE e
The MERGE will create nodes for new employees in the file and match those that already exist. Both creates and matches will have their ActiveToday property updated. From there, you just match those where the property is still false and remove them.
How can I create a relationship from a node to itself? I have one node (p:person) and my csv has 2 columns: name and vice. Each row in my csv represents a person who a ceo and their vp at the time. Now sometimes vp were ceo so I want to show that relationship. Here is what I was trying but no luck. If I do not include the WITH I receive error saying I need it but when I add the * or a property, it says it cannot find row. I'm stuck
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///ceo_vp.csv' AS row
CREATE (p:person {name:coalesce(row.name,'UNK')})
MATCH (p:person {name:row.vice })
WITH *
CREATE (p)-[:was_vp_for]->(p)
There is typo on the variable p; You must assign a different variable name for vp. Here is the script;
LOAD CSV WITH HEADERS FROM 'file:///ceo_vp.csv' AS row
MERGE (ceo:person {name:coalesce(row.name,'UNK')})
MERGE (vice:person {name:row.vice })
CREATE (vice)-[:was_vp_for]->(ceo)
Notice that I used merge because as you said, a vp can be a former ceo (and vice versa) so merge is better than create. Merge will ignore the person if it already exists.
I am working in a Streamsets pipeline to read data from a active file directory where .csv files are uploaded remotely and put those data in a neo4j database.
The steps I have used is-
Creating a observation node for each row in .csv
Creating a csv node and creating relation between csv & the record
Updating Timestamp taken from csv node to burn_in_test nodes, already created in graph database from different pipeline, if it is latest
creating relation from csv to burn in test
deleting outdated relation based on latest timestamp
Now I am doing all of these using jdbc query and the cypher query used is
MERGE (m:OBSERVATION{
SerialNumber: "${record:value('/SerialNumber')}",
Test_Stage: "${record:value('/Test_Stage')}",
CUR: "${record:value('/CUR')}",
VOLT: "${record:value('/VOLT')}",
Rel_Lot: "${record:value('/Rel_Lot')}",
TimestampINT: "${record:value('/TimestampINT')}",
Temp: "${record:value('/Temp')}",
LP: "${record:value('/LP')}",
MON: "${record:value('/MON')}"
})
MERGE (t:CSV{
SerialNumber: "${record:value('/SerialNumber')}",
Test_Stage: "${record:value('/Test_Stage')}",
TimestampINT: "${record:value('/TimestampINT')}"
})
WITH m
MATCH (t:CSV) where t.SerialNumber=m.SerialNumber and t.Test_Stage=m.Test_Stage and t.TimestampINT=m.TimestampINT MERGE (m)-[:PART_OF]->(t)
WITH t, t.TimestampINT AS TimestampINT
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage and rl.TimestampINT<TimestampINT
SET rl.TimestampINT=TimestampINT
WITH t
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage
MERGE (t)-[:POINTS_TO]->(rl)
WITH rl
MATCH (t:CSV)-[r:POINTS_TO]->(rl) WHERE t.TimestampINT<rl.TimestampINT
DELETE r
Right now this process is very slow and taking about 15 mins of time for 10 records. Can This be further optimized?
Best practices when using MERGE is to merge on a single property and then use SET to add other properties.
If I assume that serial number is property is unique for every node (might not be), it would look like:
MERGE (m:OBSERVATION{SerialNumber: "${record:value('/SerialNumber')}"})
SET m.Test_Stage = "${record:value('/Test_Stage')}",
m.CUR= "${record:value('/CUR')}",
m.VOLT= "${record:value('/VOLT')}",
m.Rel_Lot= "${record:value('/Rel_Lot')}",
m.TimestampINT = "${record:value('/TimestampINT')}",
m.Temp= "${record:value('/Temp')}",
m.LP= "${record:value('/LP')}",
m.MON= "${record:value('/MON')}"
MERGE (t:CSV{
SerialNumber: "${record:value('/SerialNumber')}"
})
SET t.Test_Stage = "${record:value('/Test_Stage')}",
t.TimestampINT = "${record:value('/TimestampINT')}"
WITH m
MATCH (t:CSV) where t.SerialNumber=m.SerialNumber and t.Test_Stage=m.Test_Stage and t.TimestampINT=m.TimestampINT MERGE (m)-[:PART_OF]->(t)
WITH t, t.TimestampINT AS TimestampINT
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage and rl.TimestampINT<TimestampINT
SET rl.TimestampINT=TimestampINT
WITH t
MATCH (rl:Burn_In_Test) where rl.SerialNumber=t.SerialNumber and rl.Test_Stage=t.Test_Stage
MERGE (t)-[:POINTS_TO]->(rl)
WITH rl
MATCH (t:CSV)-[r:POINTS_TO]->(rl) WHERE t.TimestampINT<rl.TimestampINT
DELETE r
another thing to add is that I would probably split this into two queries.
First one would be the importing part and the second one would be the delete of relationships. Also add unique constraints and indexes where possible.
I have a data set, which looks like this
Now, as one can see that a single person has multiple skillid, along with this data, i also have a skill_ref table, which has 2 columns(skillid and skillname) so from the image above, i can look and say that last person has multiple skills, Now, i want this data to be put in Neo4j, with person and skillname as node, and a relationship of has_skill. But i dont know how to handle the multiple instances, if i split the skillid , then i will have multiple instances of person name, but this is not what i want, i want something like this
in the graph,the center node is the name of the person and the others have skill name, with the arrows pointing a relation has_skill.
I am new to neo4j as well as cypher, any help guys will be highly appreciated.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.first_name,p.lastname=line.last_name
WITH line,split(line.skillid,' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
WITH line
MERGE(skill:line.skillid{name:line.skillname})
CREATE (p)-[:HAS_SKILL]->(skill)
You've got the right idea.
The general approach is to first MERGE (or CREATE) the :Person node, then split() the skillid into a list of skill ids, UNWIND the skill id list into rows, then MERGE the skill for the given id (and make sure you have an index or unique constraint on :Skill(id)), then MERGE (or CREATE) the relationship between the :Person node and the :Skill.
Here's an example, loading from a CSV file:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///your.file.url" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.f_name ... <same for the rest of the properties>
UNWIND split(line.skillid, ' ') as skillId
MERGE (skill:Skill{id:skillId})
CREATE (p)-[:HAS_SKILL]->(skill)
EDIT
Regarding the revised query you're attempting, it's actually better to use separate queries for each csv load, using the first to create the nodes and merge the relationships, and the second just to match/merge to skills and add the skill name:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
SET p.firstName = line.first_name, p.lastname=line.last_name
WITH line, split(line.skillid, ' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
CREATE (p)-[:HAS_SKILL]->(skill)
Then your next query:
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
MERGE (skill:Skill{id:line.skillid})
SET skill.name = line.skillname
Remember that you should have an a unique constraint created on :Skill(id) and :Person(id) first.
I am making a Neo4j graph to show a network of music artists.
I have a CSV with a few columns. The first column is called Artist and is the person who made the song. The second and third columns are called Feature1 and Feature2, respectively, and represent the featured artists on a song (see example https://docs.google.com/spreadsheets/d/1TE8MtNy6XnR2_QE_0W8iwoWVifd6b7KXl20oCTVo5Ug/edit?usp=sharing)
I have merged so that any given artist has just a single node. Artists are connected by a FEATURED relationship with a strength property that represents the number of times someone has been featured. When the relationship is initialized, the relationship property strength is set to 1. For example, when (X)-[r:FEATURED]->(Y) occurs the first time r.strength = 1.
CREATE CONSTRAINT ON (a:artist) ASSERT a.artistName IS UNIQUE;
CREATE CONSTRAINT ON (f:feature) ASSERT f.artistName IS UNIQUE;
CREATE CONSTRAINT ON (f:feature1) ASSERT f.artistName IS UNIQUE;
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from 'aws/artist-test.csv' as line
MERGE (artist:Artist {artistName: line.Artist})
MERGE (feature:Artist {artistName: line.Feature1})
MERGE (feature1:Artist {artistName: line.Feature2})
CREATE (artist)-[:FEATURES {strength:1}]->(feature)
CREATE (artist)-[:FEATURES {strength:1}]->(feature1)
Then I deleted the None node for songs that have no features
MATCH (artist:Artist {artistName:'None'})
OPTIONAL MATCH (artist)-[r]-()
DELETE artist, r
If X features Y on another song further down the CSV, the code currently creates another (duplicate) relationship with r.strength = 1. Rather than creating a new relationship, I'd like to have only the one (previously created) relationship and increase the value of r.strength by 1.
Any idea how can I do this? My current approach has been to just create a bunch of duplicate relationships, then go back through and count all duplicate relationships, and set
r.strength = #duplicate relationships. However, I haven't been able to get this to work, and before I waste more time on this, I figured there is a more efficient way to accomplish this.
Any help is greatly appreciated. Thanks!
You can use MERGE on relationships with ON MATCH SET
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from 'aws/artist-test.csv' as line
MERGE (artist:Artist {artistName: line.Artist})
MERGE (feature:Artist {artistName: line.Feature1})
MERGE (feature1:Artist {artistName: line.Feature2})
MERGE (artist)-[f1:FEATURES]->(feature)
ON CREATE SET f1.strength = 1
ON MATCH SET f2.strength = f1.strength + 1
MERGE (artist)-[f2:FEATURES]->(feature1)
ON CREATE SET f2.strength = 1
ON MATCH SET f2.strength = f2.strength + 1