Creating relationship between two nodes using intermediate table - neo4j

I'm very new to the Neo4j world so please forgive me if this is a trivial question. I have 2 tables I've loaded into the database using LOAD CSV
artists:
artist_name,artist_id
"Bob","abc"
"Jack","def"
"James","ghi"
"Someone","jkl"
"John","mno"
agency_list:
"Agency"
"A"
"B"
"C"
"D"
Finally, I have an intermediary table that has the artist and the agencies that represent them.
artist_agencies:
artist_name,artist_id,agency
"Bob","abc", "A"
"Bob","abc", "B"
"Jack","def", "C"
"James","ghi", "C"
"Someone","jkl","B"
"Someone","jkl", "C"
"John","mno", "D"
Notice some artists can be a part of multiple agencies (which is why I didn't include the agency variable in the Artist table)
I'm trying to get four agency nodes that connect to each artist based on a :REPRESENTS relationship. Basically something like:
(agency:Agency) - [:REPRESENTS] -> (artist:Artist)
The code I've tried is:
LOAD CSV WITH HEADERS FROM "file:///agency_list.csv" as agencies
CREATE (agency:Agency {agency: agencies.Agency})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artists.csv" as artists
CREATE (artist:Artist {artist: artists.artist_name, artist_id: artists.artist_id})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
CREATE (ag:Agency) - [:REPRESENTS] -> (ar:Artist {track_artist_uri:line.track_artist_uri})
So far I'm getting this, each blue node is a duplicate of an agency name. Rather than just having one single agency node that connects to all artists via the :REPRESENTS relationship. result
I guess my problem is that I don't know how to relate the artists table to the agency_list table via this intermediate artist_agencies table. Is there a better way to do this or am I on the right track?
Thanks!
Joey

The artist_agencies.csv query needs to find the appropriate Agency and Artist nodes before creating a relationship between them. For example:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
MATCH (ag:Agency) WHERE ag.agency = line.agency
MATCH (ar:Artist) WHERE ar.artist_id = line.artist_id
CREATE (ag)-[:REPRESENTS]->(ar)
Aside: The artist_agencies.csv file does not need the artist_name column.
[UPDATE]
If the artist_agencies.csv data could cause duplicate relationships to be created, replace CREATE with (the more expensive) MERGE to avoid that. And make sure you do not have duplicate Agency or Artist nodes.

Related

Correct order of operations in neo4j - LOAD, MERGE, MATCH, WITH, SET

I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.

Creating a relationship with an atribute - Neo4j

I'm learning about neo4j and I have the following question.
I have two groups of nodes, the first one is called Workers who have an ID and the name of the worker.
On the other hand there is another group of nodes, called products, which apart from the id, has the following attributes; price, name.
I want to make a relationship called "manipulate" where I relate a worker to the product that he is going to manipulate.
For this I have a trabajaensector.csv file which relates the workers by id, along with the products they are going to manipulate, also by id.
This is its form:
id1,id2,sector
1,1,fruteria
2,2,fruteria
3,2,fruteria
4,7,panaderia
5,5,fruteria
6,5,fruteria
7,9,bebidas
8,9,bebidas
9,10,bebidas
10,10,bebidas
11,3,pescaderia
12,8,panaderia
13,7,panaderia
14,9,bebidas
15,10,bebidas
16,4,pescaderia
17,2,fruteria
18,4,pescaderia
In summary, id1 (worker) manipulates id2 (product) and its sector is "fruteria/pescaderia/panaderia o bebida"
This is my CQL for creating manipulate relationship:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH(w:Worker),(p:Product) where w.id= toInt(csvLine.id1) and p.id=
toInt(csvLine.id2) create (w)-[sect:trabajasec]->(p) return sect
Here is my problem, the relationship is apparently creating well, however I am losing that third "sector" data, which indicates the sector where the worker works by manipulating that product.
For example, the relationship for a worker named Juan who manipulates apples should have in the relation the variable / attribute "fruteria" or for fish "pescaderia".
Any idea of how to properly include that data in the relationship and how to recover it?
You can add a sector property to the trabajasec relationships:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH (w:Worker), (p:Product)
WHERE w.id = TOINT(csvLine.id1) AND p.id = TOINT(csvLine.id2)
CREATE (w)-[sect:trabajasec {sector: csvLine.sector}]->(p)
RETURN sect;
To use the above query, you should first delete the trabajasec relationships created by your earlier LOAD CSV query.

setting up relationships with a table for the links

I'm very new to Neo4j:
I'm moving MySQL data to visualise and anaylise the data but I can't set up the relationships.
So far my build script looks like this:
// Create Players
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:player.csv" AS row
CREATE (:Player { playerID: row.id, name: row.Name });
CREATE INDEX ON :Player(name);
CREATE INDEX ON :Player(playerID);
// Create Team
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:team.csv" AS row
CREATE (:Team { teamID: row.id });
CREATE INDEX ON :Team(teamID);
// Create PlayerLinks
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:playerlinks.csv" AS row
CREATE (:Links { linkID: row.id, fromPlayerID: row.fromPlayerID, toPlayerId: row.toPlayerId, teamID: row.teamID, years: row.years });
MATCH (p:Player),(t:Team), (l:Links)
WHERE l.fromPlayerID = p.playerID
AND l.toPlayerId = p.playerID
AND l.teamID = t.teamID
CREATE
The table playerlinks contains the relationships I'd like to create
Here's a diagram of what I'm aiming to achieve:
Looks like you're almost there, actually.
As mentioned in my comments on the question itself, you'll want to drop index creation from your scripts (those should be applied only once before you do your import, and you should consider using unique constraints for ID fields).
As for your :Links nodes, is your plan to only use them to create relationships, or do you plan on keeping them around afterwards?
The approach for keeping :Links around as intermediate nodes with relationships from your :Links nodes to other elements of your graph, might look like this:
MATCH (l:Links)
WITH l
MATCH (p1:Player), (p2:Player), (t:Team)
WHERE l.fromPlayerID = p1.playerID
AND l.toPlayerId = p2.playerID
AND l.teamID = t.teamID
MERGE (l)-[:Teammate]->(p1)
MERGE (l)-[:Teammate]->(p2)
MERGE (l)-[:PlayedOn]->(t)
That connects your :Links node to the players who are teammates, to the :Team they played on, and your :Links node holds the years they played together. At that point you can remove the linkID, toPlayerID, fromPlayerID, and teamID properties from the node, since in a graph db relationships tend to replace foreign keys when translating from a relational db, and since you likely won't be looking up :Links nodes by ID.
Alternately (and according to your desired diagram) you can use the info on the :Links nodes to create relationships between :Players directly. You can set attributes on the relationships for the number of years played together and the ID (or name) of the team they played on. Keep in mind that the relationship itself will not be able to point at the :Team node where the players played together, though you should be able to use that info to create :PlayedOn relationships from the :Players to the :Team in question.
That kind of modeling might look like this:
MATCH (l:Links)
WITH l
MATCH (p1:Player), (p2:Player), (t:Team)
WHERE l.fromPlayerID = p1.playerID
AND l.toPlayerId = p2.playerID
AND l.teamID = t.teamID
MERGE (p1)-[:Teammate{years: l.years, team: t.teamID}]->(p2)
MERGE (p1)-[:PlayedOn]->(t)
MERGE (p2)-[:PlayedOn]->(t)
Keep in mind the MERGING of the :Teammate relationship may be slow. If you only plan on running this only once, you can use CREATE instead of MERGE.

Cypher query for creating and linking nodes from a csv file

I have a csv file generated with contents as follows
GOID GOName
GO:0007190 activation of adenylate cyclase activity
DiseaseID DiseaseName
D058490 46 XY Disorders of Sex Development
D000172 Acromegaly
D049913 ACTH-Secreting,Pituitary Adenoma
D058186 Acute Kidney Injury
D000310 Adrenal Gland Neoplasms
D000312 Adrenal Hyperplasia Congenital
C537045 Albright's hereditary osteodystrophy
D000544 Alzheimer Disease
D019969 Amphetamine-Related Disorders
D000855 Anorexia
D000860 Anoxia
D001008 Anxiety Disorders
D001169 Arthritis Experimental
D001171 Arthritis Juvenile
D001172 Arthritis Rheumatoid
D001249 Asthma
D001254 Astrocytoma
and so on.
I want to create link between GOIDs through Diseases such that one disease node is connected to two or more different GOID nodes.
My output should look like this
Load your diseases all at once as under a :Disease label.
Load all your Global data at once under a :Global label
Create another CSV file with the Global->Disease linkages, and use MERGE to create the relationships.
The relationship CSV would look like this:
goID,diseaseID
"GO:1234","D000456"
The command to read the CSV and create the relationships would look like this:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/D:/Relationships.csv" as line
MERGE (:Global {goID: line.goID})-[:RELATIONSHIP]->(:Disease {diseaseID: line.diseaseID})
once your data is loaded, you can then query it like so:
MATCH (g:Global {goID: "GO:0007190"})-[r:RELATIONSHIP]->(d:Disease)
return g, r, d
For cases where a disease has multiple global conditions, you can find and create a relationship like so:
match (d:Disease)
match (go1:GO)-[:RELATIONSHIP]->(d)
match (go2:GO)-[:RELATIONSHIP]->(d) where go2 <> go1
create (go1)-[:RELATIONSHIP]->(go2)
create (go2)-[:RELATIONSHIP]->(go1)
Strictly speaking you don't need a bi-directional relationship, so creating the second relationship could be left out. One potential concern is if more than one disease links two global values. If that is a concern, then setting a "Disease" property on the relationship would help identify how these globals are related.

Loading relationships from CSV data into neo4j db

Neo4j 2.1.7
Attempting to mass-connect a bunch of nodes via information I've received in a CSV, which looks like:
person_id,book_id,relationship
111,AAA,OWNS
222,BBB,BORROWS
333,AAA,BORROWS
The nodes :Person and :Book used in this CSV were successfully loaded via LOAD CSV and CREATE statements, and already exist in the database. Now, I'd like to load this above CSV of relationships between :Person and :Book. The relationships are defined in the CSV itself.
LOAD CSV WITH HEADERS FROM "file:data.csv" AS row
MATCH (person:Person { personID: row.person_id })
MATCH (book:Book { bookID: row.book_id })
Sure, the next MERGE command works if I supply a specific name ([:OWNS], [:BORROWS], etc.) but as you can see, my relationships are supplied by the incoming data.
However, I'd like the relationship defined in MERGE to not be a "hard-coded" string, but come as data from the 3rd column of my CSV instead. Something along the lines of:
MERGE (person)-[row.relationship]->(book)
Is this even possible?
PS: I've tried the syntax above, and also -[:row.relationship]->, both to no avail (syntax errors)
I don't think it is possible with LOAD CSV. You need to do a little trickery with the input data and collections. If the relationship in the input csv contains OWNS create a collection with a one in it otherwise create an empty collection. Do the same for the BORROWS relationship value. It will be something like this...
...
case when row.relationship = "OWNS" then [1] else [] end as owns
case when row.relationship = "BORROWS" then [1] else [] end as borrows
foreach(x in owns | MERGE (person)-[:OWNS]->(book))
foreach(x in borrows | MERGE (person)-[:BORROWS]->(book))
...

Resources