I am making a Neo4j graph to show a network of music artists.
I have a CSV with a few columns. The first column is called Artist and is the person who made the song. The second and third columns are called Feature1 and Feature2, respectively, and represent the featured artists on a song (see example https://docs.google.com/spreadsheets/d/1TE8MtNy6XnR2_QE_0W8iwoWVifd6b7KXl20oCTVo5Ug/edit?usp=sharing)
I have merged so that any given artist has just a single node. Artists are connected by a FEATURED relationship with a strength property that represents the number of times someone has been featured. When the relationship is initialized, the relationship property strength is set to 1. For example, when (X)-[r:FEATURED]->(Y) occurs the first time r.strength = 1.
CREATE CONSTRAINT ON (a:artist) ASSERT a.artistName IS UNIQUE;
CREATE CONSTRAINT ON (f:feature) ASSERT f.artistName IS UNIQUE;
CREATE CONSTRAINT ON (f:feature1) ASSERT f.artistName IS UNIQUE;
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from 'aws/artist-test.csv' as line
MERGE (artist:Artist {artistName: line.Artist})
MERGE (feature:Artist {artistName: line.Feature1})
MERGE (feature1:Artist {artistName: line.Feature2})
CREATE (artist)-[:FEATURES {strength:1}]->(feature)
CREATE (artist)-[:FEATURES {strength:1}]->(feature1)
Then I deleted the None node for songs that have no features
MATCH (artist:Artist {artistName:'None'})
OPTIONAL MATCH (artist)-[r]-()
DELETE artist, r
If X features Y on another song further down the CSV, the code currently creates another (duplicate) relationship with r.strength = 1. Rather than creating a new relationship, I'd like to have only the one (previously created) relationship and increase the value of r.strength by 1.
Any idea how can I do this? My current approach has been to just create a bunch of duplicate relationships, then go back through and count all duplicate relationships, and set
r.strength = #duplicate relationships. However, I haven't been able to get this to work, and before I waste more time on this, I figured there is a more efficient way to accomplish this.
Any help is greatly appreciated. Thanks!
You can use MERGE on relationships with ON MATCH SET
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS from 'aws/artist-test.csv' as line
MERGE (artist:Artist {artistName: line.Artist})
MERGE (feature:Artist {artistName: line.Feature1})
MERGE (feature1:Artist {artistName: line.Feature2})
MERGE (artist)-[f1:FEATURES]->(feature)
ON CREATE SET f1.strength = 1
ON MATCH SET f2.strength = f1.strength + 1
MERGE (artist)-[f2:FEATURES]->(feature1)
ON CREATE SET f2.strength = 1
ON MATCH SET f2.strength = f2.strength + 1
Related
I'm trying to load a sparse (co-occurrence) matrix in Neo4j but after many failed queries, it's getting frustrating.
Raw data
Basically, I want to create the nodes from the ids, and the relationship weight against each other node (including itself) should be the value on the matrix.
So, for example, 'nhs' should have a self-relationship with weight 41 and 16 with 'england', and so on.
I was trying things like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[:w]-(b);
I'm not sure how to attach the edge values though (and not yet sure if the merges are producing the expected result).
Thanks in advance for the assistance
If you just need to add a property on a relationship, where the property value is in your CSV, then it's just a matter of adding a variable for the relationship that you MERGE in, and then using SET (or ON CREATE SET, if you only want to set the property if the relationship didn't exist and needed to be created). So something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[r:w]-(b)
SET r.weight = row.weight
EDIT
Ah, took a look at the CSV clip. This is a very strange way to format your data. You have data in your header (that is, your headers are trying to define the other node to lookup) which is the wrong way to go about this. You should instead have, per row, one column that defines one of the two nodes to connect (like the "id" column) and then another column for the other node (something like an "id2"). That way you can just do two MATCHes to get your nodes, then a MERGE between them, and then setting the relationship property, similar to the sample query I provided above.
But if you're set on this format, then it's going to be a more complicated query, since we have to deal with dynamic access of the row keys and values.
Something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (start:Node {name:row.id})
WITH start, row, [key in keys(row) WHERE key <> 'id'] as keys
FOREACH (key in keys |
MERGE (end:Node {name:key})
MERGE (start)-[r:w]-(end)
ON CREATE SET r.weight = row[key] )
This is a nice Cypher challenge :) Let's say that LOAD CSV is not really meant to do this and probably you would be happier by flattening your data
Here is what I came up with :
LOAD CSV FROM "https://gist.githubusercontent.com/ikwattro/a5260d131f25bcce97c945cb97bc0bee/raw/4ce2b3421ad80ca946329a0be8a6e79ca025f253/data.csv" AS row
WITH collect(row) AS rows
WITH rows, rows[0] AS firstRow
UNWIND rows AS row
WITH firstRow, row SKIP 1
UNWIND range(0, size(row)-2) AS i
RETURN firstRow[i+1], row[0], row[i+1]
You can take a look at the gist
I already have some data in Neo4j, data is modelled in the below fashion :
:A {ID:"123",Group:"ABC",Family:"XYZ"}
:B {ID:"456",Group:"ABC",Family:"XYZ"})
(:A)-[:SCORE{score:'2'}]-(:B)
Please find the attached image for more clarification how data looks like currently.
Now,
I am importing some new data through CSV file which has 5 columns
A's ID
B's ID
Score through which A is attached to B
Group
Family
In the new data there can be some new A Ids or some new B Ids
Question :
I want to create those new nodes of type A and B and create a relationship 'Score' and assigning score as the value of relationship type 'Score' between them
there are chances that there already existing scores between A and B might have changed. So i want to just update the previous score with the new one.
How to write cypher to achieve the above problem using CSV as import.
I used the below cypher query to model data for the first time:
using periodic commit LOAD CSV WITH HEADERS FROM "file:///ABC.csv" as line Merge(a:A{ID: line.A,Group:line.Group,Family:line.Family})
Merge(b:B{ID: line.A,Group:line.Group,Family:line.Family})
Merge(a)-[:Score{score:toFloat(line.Score)}]-(b)
Note: Family and Group are same for both type of nodes 'A' and 'B'
Thanks in advance.
You can MERGE the relationship and set the score after the fact so it does not create new SCORE relationships for every new value.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///ABC.csv" AS line
MERGE (a:A {ID: line.A, Group:line.Group, Family:line.Family})
MERGE (b:B {ID: line.A, Group:line.Group, Family:line.Family})
MERGE (a)-[score:SCORE]-(b)
SET score.score = toFloat(line.Score)
I have a data set, which looks like this
Now, as one can see that a single person has multiple skillid, along with this data, i also have a skill_ref table, which has 2 columns(skillid and skillname) so from the image above, i can look and say that last person has multiple skills, Now, i want this data to be put in Neo4j, with person and skillname as node, and a relationship of has_skill. But i dont know how to handle the multiple instances, if i split the skillid , then i will have multiple instances of person name, but this is not what i want, i want something like this
in the graph,the center node is the name of the person and the others have skill name, with the arrows pointing a relation has_skill.
I am new to neo4j as well as cypher, any help guys will be highly appreciated.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.first_name,p.lastname=line.last_name
WITH line,split(line.skillid,' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
WITH line
MERGE(skill:line.skillid{name:line.skillname})
CREATE (p)-[:HAS_SKILL]->(skill)
You've got the right idea.
The general approach is to first MERGE (or CREATE) the :Person node, then split() the skillid into a list of skill ids, UNWIND the skill id list into rows, then MERGE the skill for the given id (and make sure you have an index or unique constraint on :Skill(id)), then MERGE (or CREATE) the relationship between the :Person node and the :Skill.
Here's an example, loading from a CSV file:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///your.file.url" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.f_name ... <same for the rest of the properties>
UNWIND split(line.skillid, ' ') as skillId
MERGE (skill:Skill{id:skillId})
CREATE (p)-[:HAS_SKILL]->(skill)
EDIT
Regarding the revised query you're attempting, it's actually better to use separate queries for each csv load, using the first to create the nodes and merge the relationships, and the second just to match/merge to skills and add the skill name:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
SET p.firstName = line.first_name, p.lastname=line.last_name
WITH line, split(line.skillid, ' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
CREATE (p)-[:HAS_SKILL]->(skill)
Then your next query:
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
MERGE (skill:Skill{id:line.skillid})
SET skill.name = line.skillname
Remember that you should have an a unique constraint created on :Skill(id) and :Person(id) first.
I am relatively new to neo4j.
I have imported dataset of 12 million records and I have created a relationship between two nodes. When I created the relationship, I forgot to attach a property to the relationship. Now I am trying to set the property for the relationship as follows.
LOAD CSV WITH HEADERS FROM 'file:///FileName.csv' AS row
MATCH (user:User{userID: USERID})
MATCH (order:Order{orderID: OrderId})
MATCH(user)-[acc:ORDERED]->(order)
SET acc.field1=field1,
acc.field2=field2;
But this query is taking too much time to execute,
I even tried USING index on user and order node.
MATCH (user:User{userID: USERID}) USING INDEX user:User(userID)
Isn't it possible to create new attributes for the relationship at a later point?
Please let me know, how can I do this operation in a quick and efficient way.
You also forgot to prefix your query with USING PERIODIC COMMIT,
your query will build up transaction state for 24 million changes (property updates) and won't have enough memory to keep all that state.
You also forgot row. for the data that comes from your CSV and those names are inconsistently spelled.
If you run this from neo4j browser pay attention to any YELLOW warning signs.
Run
CREATE CONSTRAINT ON (u:User) ASSERT u.userID IS UNIQUE;
Run
CREATE CONSTRAINT ON (o:Order) ASSERT o.orderID IS UNIQUE;
Run
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///FileName.csv' AS row
with row.USERID as userID, row.OrderId as orderID
MATCH (user:User{userID: userID})
USING INDEX user:User(userID)
MATCH (order:Order{orderID: orderID})
USING INDEX order:Order(orderID)
MATCH(user)-[acc:ORDERED]->(order)
SET acc.field1=row.field1, acc.field2=row.field2;
I have imported a CSV where each Node contains 3 columns. id, parent_id, and title. This is a simple tree structure i had in mysql. Now i need to create the relationships between those nodes considering the parent_id data. So each node to node will have 2 relationships as parent and child. Im really new to node4j and suggestions ?
i tried following, but no luck
MATCH (b:Branch {id}), (bb:Branch {parent_id})
CREATE (b)-[:PARENT]->(bb)
It seems as though your cypher is very close. The first thing you are going to want to do is create an index on the id and parent_id properties for the label Branch.
CREATE INDEX ON :Branch(id)
CREATE INDEX ON :Branch(parent_id)
Once you have indexes created you want to match all of the nodes with the label Branch (I would limit this with a specific value to start to make sure you create exactly what you want) and for each find the corresponding parent by matching on your indexed attributes.
MATCH (b:Branch), (bb:Branch)
WHERE b.id = ???
AND b.parent_id = bb.id
CREATE (b)-[:PARENT]->(bb)
Once you have proved this out on one branch and you get the results you expect I would run it for more branches at once. You could still choose to do it in batches depending on the number of branches in your graph.
After you have created all of the :PARENT relationships you could optionally remove all of the parent_id properties.
MATCH (b:Branch)-[:PARENT]->(:Branch)
WHERE exists(b.parent_id)
REMOVE b.parent_id