I'm very new to Neo4j:
I'm moving MySQL data to visualise and anaylise the data but I can't set up the relationships.
So far my build script looks like this:
// Create Players
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:player.csv" AS row
CREATE (:Player { playerID: row.id, name: row.Name });
CREATE INDEX ON :Player(name);
CREATE INDEX ON :Player(playerID);
// Create Team
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:team.csv" AS row
CREATE (:Team { teamID: row.id });
CREATE INDEX ON :Team(teamID);
// Create PlayerLinks
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:playerlinks.csv" AS row
CREATE (:Links { linkID: row.id, fromPlayerID: row.fromPlayerID, toPlayerId: row.toPlayerId, teamID: row.teamID, years: row.years });
MATCH (p:Player),(t:Team), (l:Links)
WHERE l.fromPlayerID = p.playerID
AND l.toPlayerId = p.playerID
AND l.teamID = t.teamID
CREATE
The table playerlinks contains the relationships I'd like to create
Here's a diagram of what I'm aiming to achieve:
Looks like you're almost there, actually.
As mentioned in my comments on the question itself, you'll want to drop index creation from your scripts (those should be applied only once before you do your import, and you should consider using unique constraints for ID fields).
As for your :Links nodes, is your plan to only use them to create relationships, or do you plan on keeping them around afterwards?
The approach for keeping :Links around as intermediate nodes with relationships from your :Links nodes to other elements of your graph, might look like this:
MATCH (l:Links)
WITH l
MATCH (p1:Player), (p2:Player), (t:Team)
WHERE l.fromPlayerID = p1.playerID
AND l.toPlayerId = p2.playerID
AND l.teamID = t.teamID
MERGE (l)-[:Teammate]->(p1)
MERGE (l)-[:Teammate]->(p2)
MERGE (l)-[:PlayedOn]->(t)
That connects your :Links node to the players who are teammates, to the :Team they played on, and your :Links node holds the years they played together. At that point you can remove the linkID, toPlayerID, fromPlayerID, and teamID properties from the node, since in a graph db relationships tend to replace foreign keys when translating from a relational db, and since you likely won't be looking up :Links nodes by ID.
Alternately (and according to your desired diagram) you can use the info on the :Links nodes to create relationships between :Players directly. You can set attributes on the relationships for the number of years played together and the ID (or name) of the team they played on. Keep in mind that the relationship itself will not be able to point at the :Team node where the players played together, though you should be able to use that info to create :PlayedOn relationships from the :Players to the :Team in question.
That kind of modeling might look like this:
MATCH (l:Links)
WITH l
MATCH (p1:Player), (p2:Player), (t:Team)
WHERE l.fromPlayerID = p1.playerID
AND l.toPlayerId = p2.playerID
AND l.teamID = t.teamID
MERGE (p1)-[:Teammate{years: l.years, team: t.teamID}]->(p2)
MERGE (p1)-[:PlayedOn]->(t)
MERGE (p2)-[:PlayedOn]->(t)
Keep in mind the MERGING of the :Teammate relationship may be slow. If you only plan on running this only once, you can use CREATE instead of MERGE.
Related
How to create a new node with two new outgoing edges? I know how to create one, but I can't figure out how to create a second edge out. Of course, I could do it in a separate MATCH statement, but it seems like it would be cleaner to just create both at once:
LOAD CSV FROM ... as ROW
MATCH (father: Father), (mother: Mother)
WHERE father.id=ROW.father_id, mother.id=ROW.mother_id
CREATE (child: Child{ ... }) ->[:IS_CHILD_OF]->(father)
// what about the IS_CHILD_OF -> mother?
In these cases, you have to be careful with CREATE because when you re-run the query, you may end up with duplications, e.g. of [:IS_CHILD_OF] edges.
Also, when you MERGE a pattern, it will create the entire pattern if a part of it does not exist, e.g. when the father is not in the store.
For these reasons, better to use an approach like below, in which I also suggest to use a :Person label. You may want to add a gender property on those nodes.
LOAD CSV FROM ... as ROW
MERGE (child:Person{id:ROW.child_id})
MERGE (father:Person {id:ROW.father_id})<-[:IS_CHILD_OF]-(child)
MERGE (mother:Person {id:ROW.mother_id})<-[:IS_CHILD_OF]-(child)
simple
CREATE (mother)<-[:IS_CHILD_OF]-(child: Child{ ... })-[:IS_CHILD_OF]->(father)
I'm very new to the Neo4j world so please forgive me if this is a trivial question. I have 2 tables I've loaded into the database using LOAD CSV
artists:
artist_name,artist_id
"Bob","abc"
"Jack","def"
"James","ghi"
"Someone","jkl"
"John","mno"
agency_list:
"Agency"
"A"
"B"
"C"
"D"
Finally, I have an intermediary table that has the artist and the agencies that represent them.
artist_agencies:
artist_name,artist_id,agency
"Bob","abc", "A"
"Bob","abc", "B"
"Jack","def", "C"
"James","ghi", "C"
"Someone","jkl","B"
"Someone","jkl", "C"
"John","mno", "D"
Notice some artists can be a part of multiple agencies (which is why I didn't include the agency variable in the Artist table)
I'm trying to get four agency nodes that connect to each artist based on a :REPRESENTS relationship. Basically something like:
(agency:Agency) - [:REPRESENTS] -> (artist:Artist)
The code I've tried is:
LOAD CSV WITH HEADERS FROM "file:///agency_list.csv" as agencies
CREATE (agency:Agency {agency: agencies.Agency})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artists.csv" as artists
CREATE (artist:Artist {artist: artists.artist_name, artist_id: artists.artist_id})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
CREATE (ag:Agency) - [:REPRESENTS] -> (ar:Artist {track_artist_uri:line.track_artist_uri})
So far I'm getting this, each blue node is a duplicate of an agency name. Rather than just having one single agency node that connects to all artists via the :REPRESENTS relationship. result
I guess my problem is that I don't know how to relate the artists table to the agency_list table via this intermediate artist_agencies table. Is there a better way to do this or am I on the right track?
Thanks!
Joey
The artist_agencies.csv query needs to find the appropriate Agency and Artist nodes before creating a relationship between them. For example:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
MATCH (ag:Agency) WHERE ag.agency = line.agency
MATCH (ar:Artist) WHERE ar.artist_id = line.artist_id
CREATE (ag)-[:REPRESENTS]->(ar)
Aside: The artist_agencies.csv file does not need the artist_name column.
[UPDATE]
If the artist_agencies.csv data could cause duplicate relationships to be created, replace CREATE with (the more expensive) MERGE to avoid that. And make sure you do not have duplicate Agency or Artist nodes.
I'm learning about neo4j and I have the following question.
I have two groups of nodes, the first one is called Workers who have an ID and the name of the worker.
On the other hand there is another group of nodes, called products, which apart from the id, has the following attributes; price, name.
I want to make a relationship called "manipulate" where I relate a worker to the product that he is going to manipulate.
For this I have a trabajaensector.csv file which relates the workers by id, along with the products they are going to manipulate, also by id.
This is its form:
id1,id2,sector
1,1,fruteria
2,2,fruteria
3,2,fruteria
4,7,panaderia
5,5,fruteria
6,5,fruteria
7,9,bebidas
8,9,bebidas
9,10,bebidas
10,10,bebidas
11,3,pescaderia
12,8,panaderia
13,7,panaderia
14,9,bebidas
15,10,bebidas
16,4,pescaderia
17,2,fruteria
18,4,pescaderia
In summary, id1 (worker) manipulates id2 (product) and its sector is "fruteria/pescaderia/panaderia o bebida"
This is my CQL for creating manipulate relationship:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH(w:Worker),(p:Product) where w.id= toInt(csvLine.id1) and p.id=
toInt(csvLine.id2) create (w)-[sect:trabajasec]->(p) return sect
Here is my problem, the relationship is apparently creating well, however I am losing that third "sector" data, which indicates the sector where the worker works by manipulating that product.
For example, the relationship for a worker named Juan who manipulates apples should have in the relation the variable / attribute "fruteria" or for fish "pescaderia".
Any idea of how to properly include that data in the relationship and how to recover it?
You can add a sector property to the trabajasec relationships:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH (w:Worker), (p:Product)
WHERE w.id = TOINT(csvLine.id1) AND p.id = TOINT(csvLine.id2)
CREATE (w)-[sect:trabajasec {sector: csvLine.sector}]->(p)
RETURN sect;
To use the above query, you should first delete the trabajasec relationships created by your earlier LOAD CSV query.
i'm trying to solve a problem of the 1: many relationship display in neo4j. My dataset is as below
child,desc,type,parent
1,PGD,Exchange,0
2,MSE 1,MSE,1
3,MSE 2,MSE,1
4,MSE 3,MSE,1
5,MSE 4,MSE,1
6,BRAS 1,BRAS,2
6,BRAS 1,BRAS,3
7,BRAS 2,BRAS,4
7,BRAS 2,BRAS,5
10,NPE 1,NPE,6
11,NPE 2,NPE,7
12,OLT,OLT,10
12,OLT,OLT,11
13,FDC,FDC,12
14,FDP,FDP,13
15,Cust 1,Customer,14
16,Cust 2,Customer,14
17,Cust 3,Customer,14
LOAD CSV WITH HEADERS FROM 'file:///FTTH_sample.csv' AS line
CREATE(:ftthsample
{child_id:line.child,
desc:line.desc,
type:line.type,
parent_id:line.parent});
//Relations
match (child:ftthsample),(parent:ftthsample)
where child.child_id=parent.parent_id
create (child)-[:test]->(parent)
//Query:
MATCH (child)-[childrel:test*]-(elem)-[parentrel:test*]->(parent)
WHERE elem.desc='FDP'
RETURN child,childrel,elem,parentrel
It returns a display as below.
I want the duplicate nodes to be displayed as one. Newbie with Neo4J. Can anyone of the experts help please?
This seems like an error in your graph creation query. You have a few lines in your query specifying the same node multiple times, but with multiple parents:
6,BRAS 1,BRAS,2
6,BRAS 1,BRAS,3
I'm guessing you actually want this to be a single node, with parent relationships to nodes with the given parent ids, instead of two separate nodes.
Let's adjust your import query. Instead of using a CREATE on each line, we'll use MERGE, and just on the child_id, which seems to be your primary key (maybe consider just using id instead, as a node can have an id on its own, without having to consider the context of whether it's a parent or child). We can use the ON CREATE clause after MERGE to add in the remaining properties only if the MERGE resulted in node creation (instead of matching to an existing node.
That will ensure we only have one node created per child_id.
Rather than having to rematch the child, we can use the child node we just created, match on the parent, and create the relationship.
LOAD CSV WITH HEADERS FROM 'file:///FTTH_sample.csv' AS line
MERGE(child:ftthsample {child_id:line.child})
ON CREATE SET
child.desc = line.desc,
child.type = line.type
WITH child, line.parent as parentId
MATCH (parent:ftthsample)
WHERE parent.child_id = parentId
MERGE (child)-[:test]->(parent)
Note that we haven't added line.parent as a property. It's not needed, since we only use that to create relationships, and after the relationships are there, we won't need those again.
I have a csv file generated with contents as follows
GOID GOName
GO:0007190 activation of adenylate cyclase activity
DiseaseID DiseaseName
D058490 46 XY Disorders of Sex Development
D000172 Acromegaly
D049913 ACTH-Secreting,Pituitary Adenoma
D058186 Acute Kidney Injury
D000310 Adrenal Gland Neoplasms
D000312 Adrenal Hyperplasia Congenital
C537045 Albright's hereditary osteodystrophy
D000544 Alzheimer Disease
D019969 Amphetamine-Related Disorders
D000855 Anorexia
D000860 Anoxia
D001008 Anxiety Disorders
D001169 Arthritis Experimental
D001171 Arthritis Juvenile
D001172 Arthritis Rheumatoid
D001249 Asthma
D001254 Astrocytoma
and so on.
I want to create link between GOIDs through Diseases such that one disease node is connected to two or more different GOID nodes.
My output should look like this
Load your diseases all at once as under a :Disease label.
Load all your Global data at once under a :Global label
Create another CSV file with the Global->Disease linkages, and use MERGE to create the relationships.
The relationship CSV would look like this:
goID,diseaseID
"GO:1234","D000456"
The command to read the CSV and create the relationships would look like this:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/D:/Relationships.csv" as line
MERGE (:Global {goID: line.goID})-[:RELATIONSHIP]->(:Disease {diseaseID: line.diseaseID})
once your data is loaded, you can then query it like so:
MATCH (g:Global {goID: "GO:0007190"})-[r:RELATIONSHIP]->(d:Disease)
return g, r, d
For cases where a disease has multiple global conditions, you can find and create a relationship like so:
match (d:Disease)
match (go1:GO)-[:RELATIONSHIP]->(d)
match (go2:GO)-[:RELATIONSHIP]->(d) where go2 <> go1
create (go1)-[:RELATIONSHIP]->(go2)
create (go2)-[:RELATIONSHIP]->(go1)
Strictly speaking you don't need a bi-directional relationship, so creating the second relationship could be left out. One potential concern is if more than one disease links two global values. If that is a concern, then setting a "Disease" property on the relationship would help identify how these globals are related.