Loading relationships from CSV data into neo4j db - neo4j

Neo4j 2.1.7
Attempting to mass-connect a bunch of nodes via information I've received in a CSV, which looks like:
person_id,book_id,relationship
111,AAA,OWNS
222,BBB,BORROWS
333,AAA,BORROWS
The nodes :Person and :Book used in this CSV were successfully loaded via LOAD CSV and CREATE statements, and already exist in the database. Now, I'd like to load this above CSV of relationships between :Person and :Book. The relationships are defined in the CSV itself.
LOAD CSV WITH HEADERS FROM "file:data.csv" AS row
MATCH (person:Person { personID: row.person_id })
MATCH (book:Book { bookID: row.book_id })
Sure, the next MERGE command works if I supply a specific name ([:OWNS], [:BORROWS], etc.) but as you can see, my relationships are supplied by the incoming data.
However, I'd like the relationship defined in MERGE to not be a "hard-coded" string, but come as data from the 3rd column of my CSV instead. Something along the lines of:
MERGE (person)-[row.relationship]->(book)
Is this even possible?
PS: I've tried the syntax above, and also -[:row.relationship]->, both to no avail (syntax errors)

I don't think it is possible with LOAD CSV. You need to do a little trickery with the input data and collections. If the relationship in the input csv contains OWNS create a collection with a one in it otherwise create an empty collection. Do the same for the BORROWS relationship value. It will be something like this...
...
case when row.relationship = "OWNS" then [1] else [] end as owns
case when row.relationship = "BORROWS" then [1] else [] end as borrows
foreach(x in owns | MERGE (person)-[:OWNS]->(book))
foreach(x in borrows | MERGE (person)-[:BORROWS]->(book))
...

Related

Neo4J - unable to create relationships (30,000)

I've got two csv files Job (30,000 entries) and Cat (30 entries) imported into neo4j and am trying to create a relationship between them
Each Job has a cat_ID and Cat contains the category name and ID
after executing the following
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MATCH (job:Job {cat_ID: row.cat_ID})
MATCH (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
it returns (no changes, no records)
I received a prompt recommending that I index the category and so using
Create INDEX ON :Job(cat_id); I did, but I still get the same error
How do I create a relationship between the two?
I am able to get this to work on a smaller dataset
You are probably trying to match on non-existing nodes. Try
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MERGE (job:Job {cat_ID: row.cat_ID})
MERGE (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
Have a look in your logs and see if you are running out of memory.
You could try chunking the data set up into smaller pieces with Periodic Commit and see if that helps:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MATCH (job:Job {cat_ID: row.cat_ID})
MATCH (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)

Creating relationship between two nodes using intermediate table

I'm very new to the Neo4j world so please forgive me if this is a trivial question. I have 2 tables I've loaded into the database using LOAD CSV
artists:
artist_name,artist_id
"Bob","abc"
"Jack","def"
"James","ghi"
"Someone","jkl"
"John","mno"
agency_list:
"Agency"
"A"
"B"
"C"
"D"
Finally, I have an intermediary table that has the artist and the agencies that represent them.
artist_agencies:
artist_name,artist_id,agency
"Bob","abc", "A"
"Bob","abc", "B"
"Jack","def", "C"
"James","ghi", "C"
"Someone","jkl","B"
"Someone","jkl", "C"
"John","mno", "D"
Notice some artists can be a part of multiple agencies (which is why I didn't include the agency variable in the Artist table)
I'm trying to get four agency nodes that connect to each artist based on a :REPRESENTS relationship. Basically something like:
(agency:Agency) - [:REPRESENTS] -> (artist:Artist)
The code I've tried is:
LOAD CSV WITH HEADERS FROM "file:///agency_list.csv" as agencies
CREATE (agency:Agency {agency: agencies.Agency})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artists.csv" as artists
CREATE (artist:Artist {artist: artists.artist_name, artist_id: artists.artist_id})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
CREATE (ag:Agency) - [:REPRESENTS] -> (ar:Artist {track_artist_uri:line.track_artist_uri})
So far I'm getting this, each blue node is a duplicate of an agency name. Rather than just having one single agency node that connects to all artists via the :REPRESENTS relationship. result
I guess my problem is that I don't know how to relate the artists table to the agency_list table via this intermediate artist_agencies table. Is there a better way to do this or am I on the right track?
Thanks!
Joey
The artist_agencies.csv query needs to find the appropriate Agency and Artist nodes before creating a relationship between them. For example:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
MATCH (ag:Agency) WHERE ag.agency = line.agency
MATCH (ar:Artist) WHERE ar.artist_id = line.artist_id
CREATE (ag)-[:REPRESENTS]->(ar)
Aside: The artist_agencies.csv file does not need the artist_name column.
[UPDATE]
If the artist_agencies.csv data could cause duplicate relationships to be created, replace CREATE with (the more expensive) MERGE to avoid that. And make sure you do not have duplicate Agency or Artist nodes.

Creating a relationship with an atribute - Neo4j

I'm learning about neo4j and I have the following question.
I have two groups of nodes, the first one is called Workers who have an ID and the name of the worker.
On the other hand there is another group of nodes, called products, which apart from the id, has the following attributes; price, name.
I want to make a relationship called "manipulate" where I relate a worker to the product that he is going to manipulate.
For this I have a trabajaensector.csv file which relates the workers by id, along with the products they are going to manipulate, also by id.
This is its form:
id1,id2,sector
1,1,fruteria
2,2,fruteria
3,2,fruteria
4,7,panaderia
5,5,fruteria
6,5,fruteria
7,9,bebidas
8,9,bebidas
9,10,bebidas
10,10,bebidas
11,3,pescaderia
12,8,panaderia
13,7,panaderia
14,9,bebidas
15,10,bebidas
16,4,pescaderia
17,2,fruteria
18,4,pescaderia
In summary, id1 (worker) manipulates id2 (product) and its sector is "fruteria/pescaderia/panaderia o bebida"
This is my CQL for creating manipulate relationship:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH(w:Worker),(p:Product) where w.id= toInt(csvLine.id1) and p.id=
toInt(csvLine.id2) create (w)-[sect:trabajasec]->(p) return sect
Here is my problem, the relationship is apparently creating well, however I am losing that third "sector" data, which indicates the sector where the worker works by manipulating that product.
For example, the relationship for a worker named Juan who manipulates apples should have in the relation the variable / attribute "fruteria" or for fish "pescaderia".
Any idea of how to properly include that data in the relationship and how to recover it?
You can add a sector property to the trabajasec relationships:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH (w:Worker), (p:Product)
WHERE w.id = TOINT(csvLine.id1) AND p.id = TOINT(csvLine.id2)
CREATE (w)-[sect:trabajasec {sector: csvLine.sector}]->(p)
RETURN sect;
To use the above query, you should first delete the trabajasec relationships created by your earlier LOAD CSV query.

Cypher query for creating and linking nodes from a csv file

I have a csv file generated with contents as follows
GOID GOName
GO:0007190 activation of adenylate cyclase activity
DiseaseID DiseaseName
D058490 46 XY Disorders of Sex Development
D000172 Acromegaly
D049913 ACTH-Secreting,Pituitary Adenoma
D058186 Acute Kidney Injury
D000310 Adrenal Gland Neoplasms
D000312 Adrenal Hyperplasia Congenital
C537045 Albright's hereditary osteodystrophy
D000544 Alzheimer Disease
D019969 Amphetamine-Related Disorders
D000855 Anorexia
D000860 Anoxia
D001008 Anxiety Disorders
D001169 Arthritis Experimental
D001171 Arthritis Juvenile
D001172 Arthritis Rheumatoid
D001249 Asthma
D001254 Astrocytoma
and so on.
I want to create link between GOIDs through Diseases such that one disease node is connected to two or more different GOID nodes.
My output should look like this
Load your diseases all at once as under a :Disease label.
Load all your Global data at once under a :Global label
Create another CSV file with the Global->Disease linkages, and use MERGE to create the relationships.
The relationship CSV would look like this:
goID,diseaseID
"GO:1234","D000456"
The command to read the CSV and create the relationships would look like this:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/D:/Relationships.csv" as line
MERGE (:Global {goID: line.goID})-[:RELATIONSHIP]->(:Disease {diseaseID: line.diseaseID})
once your data is loaded, you can then query it like so:
MATCH (g:Global {goID: "GO:0007190"})-[r:RELATIONSHIP]->(d:Disease)
return g, r, d
For cases where a disease has multiple global conditions, you can find and create a relationship like so:
match (d:Disease)
match (go1:GO)-[:RELATIONSHIP]->(d)
match (go2:GO)-[:RELATIONSHIP]->(d) where go2 <> go1
create (go1)-[:RELATIONSHIP]->(go2)
create (go2)-[:RELATIONSHIP]->(go1)
Strictly speaking you don't need a bi-directional relationship, so creating the second relationship could be left out. One potential concern is if more than one disease links two global values. If that is a concern, then setting a "Disease" property on the relationship would help identify how these globals are related.

How to correctly use conditionals like IF or CASE in Cypher query language (Neo4J) to successfully create relationships?

I failed to create relationships in Neo4J and I would like to encourage anyone who has sucessfully done it to help me.
The desired result is to have a detailed visualisation of who is a brother to whom, who is who's mother and so on. I want to extract the data from single parent-child relationships. That means, setting a relationship like [:relatedTo {:how['daughter']}] if a node has a parent whose name corresponds to the field node.name and the gender of the node is F.
I have my CSV file that looks like this.
1;Jakub Hančin;M;1994;4;3
2;Hana Hančinová;F;1991;4;3
3;Alojz Hančin jr.;M;1968;15;14
4;Viera Hančinová;F;1968;9;
5;Miroslav Barus sr.;M;1965;9;
6;Helena Barusová;F;1942;;
7;Miroslav Barus jr.;M;1995;6;5
8;Martin Barus;M;1991;6;5
9;Hedviga Barusová;F;1945;;
10;Peter Hančin jr.;M;1991;12;13
11;Zuzka Hančinová;F;1996;12;13
12;Andrea Hančinová;F;1966;;
13;Peter Hančin sr.;M;1965;15;14
14;Alojz Hančin sr.;M;1937;;
15;Anna Hančinová;F;1945;;
This is my personal family tree and I would like to visualize it through Neo4J.
It is a file created with Excel, where I put the information into a table and create a database. Then it was converted to .csv file which is importable into Neo4J. I have sucessfully installed it and now I am at the point of writing the Cypher script to manage it. So far, I have this:
LOAD CSV WITH HEADERS FROM "file:c:/users/Skelo/Desktop/Family Database/Family Database CSV UTF.txt" AS row FIELDTERMINATOR ';'
CREATE (n:Person)
SET n = row, n.name = row.name,
n.personID = toInt(row.personID) , n.G = row.G,
n.Year = toInt(row.Year), n.Parent1 = row.Parent1, n.Parent2 = row.Parent2
WITH n
MATCH(n:Person),(b:Person)
WHERE n.Parent1 = b.name OR n.Parent2 = b.name
CASE b.gender
WHEN b.gender = 'F' THEN
CREATE (b)-[:isRelatedTo{how:['mother']}]->(n)
WHEN b.gender = 'M' THEN
CREATE (b)-[:isRelatedTo{how:['father']}]->(n)
RETURN *
The error message shown looks like this.
Invalid input 'A': expected 'r/R' (line 11, column 2 (offset: 389))
"CASE b.gender"
^
Somehow, I can't figure out why this does not work. Why can't I use the Case command? The Neo4J does not allow me to use anything but the command CREATE (it expects a letter R after C and not an A, this means the CREATE command).
Again, I want to do this. I have a few nodes that are correctly set. For each of those nodes (they represent people), I want to look into the Parent1 and Parent2 fields and to look for a node that has the same name as one of these fields. If it matches one of these, I want to mark that node as a father or a mother to the previous node (judging by the gender of the node, which represents the person).
This way I would like to fill the graph database with many relationships, but I fail at this very basic step. Please help me. If you can, please do not only say what is wrong and why it is wrong, but present a solution that works.
Since you want to create the isRelatedTo relationship regardless of gender and only the property is dependent upon a conditional, do this:
CREATE (b)-[r:isRelatedTo]->(n)
SET r.how = CASE b.gender WHEN 'F' THEN 'mother' ELSE 'father' END

Resources