creating relationships when using READ CSV in Cypher - neo4j

I have a really simple CSV that I created so I can practice loading CSVs into Neo4j.
The CSV looks like this:
boxer_id name boxer_country total_wins bdate fought fight_id fight_location outcome
1 Glass Joe France 0 1/2/80 2 100 Las Vegas L
2 Bald Bull Turkey 2 2/3/81 1 100 Macao W
3 Soda Popinski Russia 6 3/4/82 4 101 Atlantic City L
4 Sandman USA 9 4/5/83 3 101 Japan W
I want to make 2 nodes, boxer and fight.
But I'm having trouble connecting the boxers to the fights.
Here's as far as I got:
As you can see, I successfully read in the nodes, but I don't know how to create the relationship between boxers and their boxing matches.
I want to do something like:
CREATE (boxer)-[:AGAINST]->(boxer)
but this doesn't make sense. I need to use the field fought, which encapsulates the information regarding who has faced who in the ring.
Any advice would be greatly appreciated. I'm not sure how to do this in the context of READ CSV.
Here's my code:
// The goal here is to create a node called Boxer, and pull in properties.
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
WITH line, SPLIT(line.bdate, '/') AS bdate
CREATE (b:boxer {boxer_id: line.boxer_id})
SET b.byear= TOINT(bdate[2]),
b.bmonth= TOINT(bdate[0]),
b.bday = TOINT(bdate[1]),
b.name = line.name,
b.country = line.boxer_country,
b.total_wins = TOINT(line.total_wins)
// Now we make a node called Fight
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
CREATE (f:fight {fight_id: line.fight_id, fight_loc: line.fight_location})
// Now we set relationships
// ????

You could add a few lines to match the boxers you already created and create relationships between them and the newly created fight. I am thinking something along these lines might work for you...
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
MATCH (b1:boxer {boxer_id: line.boxer_id})
WITH line, b1
MATCH (b2:boxer {boxer_id: line.fought})
MERGE (f:fight {fight_id: line.fight_id})
CREATE (b1)-[:AGAINST]->(b2)
CREATE (b1)-[:FOUGHT_IN]->(f)
CREATE (b2)-[:FOUGHT_IN]->(f)

One option is to just model fights as relationships between Boxer nodes, instead of creating the Fight nodes:
LOAD CSV WITH HEADERS FROM 'file:///test.csv' AS line
MERGE (b1:Boxer {boxer_id: line.boxer_id})
MERGE (b2:Boxer {boxer_id: line.fought})
CREATE (b1)-[f:fought]->(b2)
SET f.location = line.fight_location,
f.outcome = line.outcome
However it probably makes more sense to model the fights as nodes, since they are events. In that case something like this:
LOAD CSV WITH HEADERS FROM 'file:///text.csv' AS line
MATCH (b:Boxer {boxer_id: line.boxer_id})
MERGE (f:fight {fight_id: line.fight_id})
ON CREATE SET f.location = line.fight_location
CREATE (b)-[r:FOUGHT_IN]->(f)
WITH r, CASE line.outcome WHEN "W" THEN [1] ELSE [] END AS win
FOREACH (x IN win | SET r.winner = TRUE)
Note here that we are storing the outcome of the fight as a property on the :FOUGHT_IN relationship.
Edit Updated to use MERGE to avoid creating duplicate Fight nodes. When using MERGE you should also create a uniqueness constraint: CREATE CONSTRAINT ON (f:Fight) ASSERT f.fight_id IS UNIQUE; before running the import script.

Related

Correct order of operations in neo4j - LOAD, MERGE, MATCH, WITH, SET

I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.

Create relationship on Neo4j using CSV files

I want to create a simple DB using some CSV files, like this:
attore.csv, film.csv, recita.csv.
I created successfully the nodes with the label Attore and Film, simple files like this:
attore.csv:
nome
nome1
nome2
nome3
film.csv
titolo
titolo1
titolo2
titolo3
and I was trying to create the relationship between them using recita.csv, in which each row is:
attore, film
Obv my primary key should be Attore(nome) and Film(titolo).
I've been looking for so much time, I found many codes but no one is working, every try I made just run for something like an hour.
This is what I did:
I created the film nodes:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///film.csv" AS row
CREATE (n:Film)
SET n = row, n.titolo = (row.titolo), n.durata = (row.durata),
n.genere = (row.genere), n.anno = (row.anno), n.descrizione =
(row.descrizione), n.regista = (row.regista),
n.studio_cinematografico = (row.studio_cinematografico)
Then I created the attore nodes:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///attore.csv" AS row
CREATE (n:Attore)
SET n = row, n.nome = (row.nome)
And then, after so much try I thought this was the exact way to create relationship, but didn't work:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///recita.csv" AS row
MATCH (attore:Attore {nome: row.attore})
MATCH (film:Film {titolo: row.film})
MERGE (attore)-[:RECITA]-(film);
I hope that someone could tell me the right way to create relationship, thanks.
EDIT: Examples of how are structured my files
attore.csv:
nome
Brendan Fraser
Bett Granstaff
Leslie Nielsen
Martina Gedeck
Martin Sheen
film.csv:
titolo durata genere anno descrizione regista studio_cin
Mortdecai 80 Action 2015 *something* David Koepp Liongate
recita.csv:
attore film
Johnny Depp Mortdecai
Jason Momoa Braven
Instead of the approach you are using. I would recommend to use Merge instead of Create, in this way you can avoid repetitions:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///attore.csv" AS row
MERGE (a:Attore{nome: row.nome})
RETURN a
the same is applied for the film.csv just separate properties with comma.
Second considering your csv docs format, check again the .csv format documentation. From what you have explained and if you want to make your code working, you need to have just two columns in your recita.csv (attore, film) and not 6 as you have (attore, film attore, film attore, film), because they are identical, but the column identifier (name) should be unique you don't need to repeat attore and film 3 times.
Please check the headers of all your files or expand your question with examples of your csv's.
Try to change your recita.csv file according to csv format requirements.

Cypher slowly create relations with some type nodes

I have one type of node and one type of relationship.
USING PERIODIC COMMIT 500
load csv from 'http://host.int:8787/rel_import.csv' as line FIELDTERMINATOR ';'
match(c1)
with c1,line, trim(line[0]) as abs1, trim(line[1]) as abs2
match(c2)
where (c1.abs = abs1 and c2.abs = abs2) or (c1.abs = abs2 and c2.abs = abs1)
create (c1)-[rel:relations{abs1:line[0], abs2:line[1], kind:line[2],personId:line[3], rel_k1:line[4], rel_k2:line[5],contact:line[6], id:line[7]}]->(c2)
So, it was fast.
I div one type node (now five types, old type deleted, summary count of entities not changed), and have problem with speed creating relationship. structure of nodes not changed, indexes created for all types.
How do it right?
I think the problem is the where clause in your join is complex. I find complex where clauses on joins cause it to be really slow. Could you do: "where c1.abs <> c2.abs?"
Are you able to do something like this:
USING PERIODIC COMMIT 500
load csv from 'http://host.int:8787/rel_import.csv' as line FIELDTERMINATOR ';'
with line, trim(line[0]) as abs1, time(line[1] as abs2
match(c1{abs: abs1})
match(c2 {abs:abs2})
match c3 {abs: abs2})
match c4 {abs: abs1})
where c1.abs <> c2.abs and c3.abs <> c4.abs
create (c1)-[rel:relations{abs1:line[0], abs2:line[1], kind:line[2],personId:line[3], rel_k1:line[4], rel_k2:line[5],contact:line[6], id:line[7]}]->(c2)
create (c3)-[rel:relations{abs1:line[0], abs2:line[1], kind:line[2],personId:line[3], rel_k1:line[4], rel_k2:line[5],contact:line[6], id:line[7]}]->(c4)
If possible I'd break up the match c1, c2 and the match c3, c4 and run LOAD CSV twice, I find it's best when possible to do fewer steps within the LOAD CSV.

Cypher query for creating and linking nodes from a csv file

I have a csv file generated with contents as follows
GOID GOName
GO:0007190 activation of adenylate cyclase activity
DiseaseID DiseaseName
D058490 46 XY Disorders of Sex Development
D000172 Acromegaly
D049913 ACTH-Secreting,Pituitary Adenoma
D058186 Acute Kidney Injury
D000310 Adrenal Gland Neoplasms
D000312 Adrenal Hyperplasia Congenital
C537045 Albright's hereditary osteodystrophy
D000544 Alzheimer Disease
D019969 Amphetamine-Related Disorders
D000855 Anorexia
D000860 Anoxia
D001008 Anxiety Disorders
D001169 Arthritis Experimental
D001171 Arthritis Juvenile
D001172 Arthritis Rheumatoid
D001249 Asthma
D001254 Astrocytoma
and so on.
I want to create link between GOIDs through Diseases such that one disease node is connected to two or more different GOID nodes.
My output should look like this
Load your diseases all at once as under a :Disease label.
Load all your Global data at once under a :Global label
Create another CSV file with the Global->Disease linkages, and use MERGE to create the relationships.
The relationship CSV would look like this:
goID,diseaseID
"GO:1234","D000456"
The command to read the CSV and create the relationships would look like this:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/D:/Relationships.csv" as line
MERGE (:Global {goID: line.goID})-[:RELATIONSHIP]->(:Disease {diseaseID: line.diseaseID})
once your data is loaded, you can then query it like so:
MATCH (g:Global {goID: "GO:0007190"})-[r:RELATIONSHIP]->(d:Disease)
return g, r, d
For cases where a disease has multiple global conditions, you can find and create a relationship like so:
match (d:Disease)
match (go1:GO)-[:RELATIONSHIP]->(d)
match (go2:GO)-[:RELATIONSHIP]->(d) where go2 <> go1
create (go1)-[:RELATIONSHIP]->(go2)
create (go2)-[:RELATIONSHIP]->(go1)
Strictly speaking you don't need a bi-directional relationship, so creating the second relationship could be left out. One potential concern is if more than one disease links two global values. If that is a concern, then setting a "Disease" property on the relationship would help identify how these globals are related.

How to correctly use conditionals like IF or CASE in Cypher query language (Neo4J) to successfully create relationships?

I failed to create relationships in Neo4J and I would like to encourage anyone who has sucessfully done it to help me.
The desired result is to have a detailed visualisation of who is a brother to whom, who is who's mother and so on. I want to extract the data from single parent-child relationships. That means, setting a relationship like [:relatedTo {:how['daughter']}] if a node has a parent whose name corresponds to the field node.name and the gender of the node is F.
I have my CSV file that looks like this.
1;Jakub Hančin;M;1994;4;3
2;Hana Hančinová;F;1991;4;3
3;Alojz Hančin jr.;M;1968;15;14
4;Viera Hančinová;F;1968;9;
5;Miroslav Barus sr.;M;1965;9;
6;Helena Barusová;F;1942;;
7;Miroslav Barus jr.;M;1995;6;5
8;Martin Barus;M;1991;6;5
9;Hedviga Barusová;F;1945;;
10;Peter Hančin jr.;M;1991;12;13
11;Zuzka Hančinová;F;1996;12;13
12;Andrea Hančinová;F;1966;;
13;Peter Hančin sr.;M;1965;15;14
14;Alojz Hančin sr.;M;1937;;
15;Anna Hančinová;F;1945;;
This is my personal family tree and I would like to visualize it through Neo4J.
It is a file created with Excel, where I put the information into a table and create a database. Then it was converted to .csv file which is importable into Neo4J. I have sucessfully installed it and now I am at the point of writing the Cypher script to manage it. So far, I have this:
LOAD CSV WITH HEADERS FROM "file:c:/users/Skelo/Desktop/Family Database/Family Database CSV UTF.txt" AS row FIELDTERMINATOR ';'
CREATE (n:Person)
SET n = row, n.name = row.name,
n.personID = toInt(row.personID) , n.G = row.G,
n.Year = toInt(row.Year), n.Parent1 = row.Parent1, n.Parent2 = row.Parent2
WITH n
MATCH(n:Person),(b:Person)
WHERE n.Parent1 = b.name OR n.Parent2 = b.name
CASE b.gender
WHEN b.gender = 'F' THEN
CREATE (b)-[:isRelatedTo{how:['mother']}]->(n)
WHEN b.gender = 'M' THEN
CREATE (b)-[:isRelatedTo{how:['father']}]->(n)
RETURN *
The error message shown looks like this.
Invalid input 'A': expected 'r/R' (line 11, column 2 (offset: 389))
"CASE b.gender"
^
Somehow, I can't figure out why this does not work. Why can't I use the Case command? The Neo4J does not allow me to use anything but the command CREATE (it expects a letter R after C and not an A, this means the CREATE command).
Again, I want to do this. I have a few nodes that are correctly set. For each of those nodes (they represent people), I want to look into the Parent1 and Parent2 fields and to look for a node that has the same name as one of these fields. If it matches one of these, I want to mark that node as a father or a mother to the previous node (judging by the gender of the node, which represents the person).
This way I would like to fill the graph database with many relationships, but I fail at this very basic step. Please help me. If you can, please do not only say what is wrong and why it is wrong, but present a solution that works.
Since you want to create the isRelatedTo relationship regardless of gender and only the property is dependent upon a conditional, do this:
CREATE (b)-[r:isRelatedTo]->(n)
SET r.how = CASE b.gender WHEN 'F' THEN 'mother' ELSE 'father' END

Resources