I want to create a simple DB using some CSV files, like this:
attore.csv, film.csv, recita.csv.
I created successfully the nodes with the label Attore and Film, simple files like this:
attore.csv:
nome
nome1
nome2
nome3
film.csv
titolo
titolo1
titolo2
titolo3
and I was trying to create the relationship between them using recita.csv, in which each row is:
attore, film
Obv my primary key should be Attore(nome) and Film(titolo).
I've been looking for so much time, I found many codes but no one is working, every try I made just run for something like an hour.
This is what I did:
I created the film nodes:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///film.csv" AS row
CREATE (n:Film)
SET n = row, n.titolo = (row.titolo), n.durata = (row.durata),
n.genere = (row.genere), n.anno = (row.anno), n.descrizione =
(row.descrizione), n.regista = (row.regista),
n.studio_cinematografico = (row.studio_cinematografico)
Then I created the attore nodes:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///attore.csv" AS row
CREATE (n:Attore)
SET n = row, n.nome = (row.nome)
And then, after so much try I thought this was the exact way to create relationship, but didn't work:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///recita.csv" AS row
MATCH (attore:Attore {nome: row.attore})
MATCH (film:Film {titolo: row.film})
MERGE (attore)-[:RECITA]-(film);
I hope that someone could tell me the right way to create relationship, thanks.
EDIT: Examples of how are structured my files
attore.csv:
nome
Brendan Fraser
Bett Granstaff
Leslie Nielsen
Martina Gedeck
Martin Sheen
film.csv:
titolo durata genere anno descrizione regista studio_cin
Mortdecai 80 Action 2015 *something* David Koepp Liongate
recita.csv:
attore film
Johnny Depp Mortdecai
Jason Momoa Braven
Instead of the approach you are using. I would recommend to use Merge instead of Create, in this way you can avoid repetitions:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///attore.csv" AS row
MERGE (a:Attore{nome: row.nome})
RETURN a
the same is applied for the film.csv just separate properties with comma.
Second considering your csv docs format, check again the .csv format documentation. From what you have explained and if you want to make your code working, you need to have just two columns in your recita.csv (attore, film) and not 6 as you have (attore, film attore, film attore, film), because they are identical, but the column identifier (name) should be unique you don't need to repeat attore and film 3 times.
Please check the headers of all your files or expand your question with examples of your csv's.
Try to change your recita.csv file according to csv format requirements.
Related
i m new at neo4j and i d like to upload a csv file and create a set of nodes. However i have already some existing nodes that may exist on that csv file. Is there an option to load the csv, create the nodes based on each row and in case the node already exists skip that row?
Thanks
You can use the MERGE clause to avoid creating duplicate nodes and relationships.
However, you need to carefully read the documentation to understand how to use MERGE, as incorrect usage can cause the unintentional creation of nodes and relationships.
Merge will give you what you want, however you must be careful how you identify the record uniquely to prevent creating duplicates
I'll put the desired final form first as attention spans seem to be on the decline...
// This one is safe assuming name is a true unique identifier of your Friends
// and that their favorite colors and foods may change over time
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0]})
set a.favorite_food = line[1]
set a.favorite_color = line[2]
The merge above will create or find the Friend node with that matching name and then, regardless of whether we are creating it or updating it, set the attributes on it.
If we were to instead provide all the attributes in the merge as such:
// This one is dangerous - all attributes must match in order
// to find the existing Friend node
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0], favorite_food: line[1], favorite_color: line[2]})
Then we would fail to find an existing friend everytime their favorite_food or favorite_color was updated in our data being (re)loaded.
Here's an example for anyone who's imagination hasn't fully filled in the blanks...
//Last month's file contained:
Bob Marley,Hemp Seeds,Green
//This month's file contained:
Bob Marley,Soylent Green,Rainbow
Let's say initially create Order nodes through the csv file orders.csv
// Create orders
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
ON CREATE SET order.shipName = row.ShipName
Later I added more columns to the orders.csv, and I suppose I can add new properties into the graph this way:
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
ON CREATE SET order.shipName = row.ShipName, order.customerId = row.CustomerID, order.employeeID = row.EmployeeID;
Here two new properties 'customerId' and 'employeeId' to be added to each node of Order. I tested this command, but it doesn't change the graph at all. Does merge function incrementally add into to the graph?
MERGE works on exactly the expression you provide it, so
MERGE (order:Order {orderID: row.OrderID})
will check for a node with the label Order and an orderID property set to the value (and type) of row.orderID. If this doesn't exist exactly, it will be created.
Because you are using ON CREATE... that line will only occur if the node is being created by the merge, not if it is simply found (matched).
You probably want to look at using ON MATCH... instead - https://neo4j.com/docs/cypher-manual/current/clauses/merge/#query-merge-on-create-on-match
ON CREATE is only used by MERGE when it needs to create something.
On the other hand, ON MATCH is used by MERGE when it does not need to create anything.
So, your new query should look like this (assuming that you added no new rows to the CSV file, but only columns):
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
ON MATCH SET order.customerId = row.CustomerID, order.employeeID = row.EmployeeID;
I'm very new to the Neo4j world so please forgive me if this is a trivial question. I have 2 tables I've loaded into the database using LOAD CSV
artists:
artist_name,artist_id
"Bob","abc"
"Jack","def"
"James","ghi"
"Someone","jkl"
"John","mno"
agency_list:
"Agency"
"A"
"B"
"C"
"D"
Finally, I have an intermediary table that has the artist and the agencies that represent them.
artist_agencies:
artist_name,artist_id,agency
"Bob","abc", "A"
"Bob","abc", "B"
"Jack","def", "C"
"James","ghi", "C"
"Someone","jkl","B"
"Someone","jkl", "C"
"John","mno", "D"
Notice some artists can be a part of multiple agencies (which is why I didn't include the agency variable in the Artist table)
I'm trying to get four agency nodes that connect to each artist based on a :REPRESENTS relationship. Basically something like:
(agency:Agency) - [:REPRESENTS] -> (artist:Artist)
The code I've tried is:
LOAD CSV WITH HEADERS FROM "file:///agency_list.csv" as agencies
CREATE (agency:Agency {agency: agencies.Agency})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artists.csv" as artists
CREATE (artist:Artist {artist: artists.artist_name, artist_id: artists.artist_id})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
CREATE (ag:Agency) - [:REPRESENTS] -> (ar:Artist {track_artist_uri:line.track_artist_uri})
So far I'm getting this, each blue node is a duplicate of an agency name. Rather than just having one single agency node that connects to all artists via the :REPRESENTS relationship. result
I guess my problem is that I don't know how to relate the artists table to the agency_list table via this intermediate artist_agencies table. Is there a better way to do this or am I on the right track?
Thanks!
Joey
The artist_agencies.csv query needs to find the appropriate Agency and Artist nodes before creating a relationship between them. For example:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
MATCH (ag:Agency) WHERE ag.agency = line.agency
MATCH (ar:Artist) WHERE ar.artist_id = line.artist_id
CREATE (ag)-[:REPRESENTS]->(ar)
Aside: The artist_agencies.csv file does not need the artist_name column.
[UPDATE]
If the artist_agencies.csv data could cause duplicate relationships to be created, replace CREATE with (the more expensive) MERGE to avoid that. And make sure you do not have duplicate Agency or Artist nodes.
I am importing the following to Neo4J:
categories.csv
CategoryName1
CategoryName2
CategoryName3
...
categories_relations.csv
category_parent category_child
CategoryName3 CategoryName10
CategoryName32 CategoryName41
...
Basically, categories_relations.csv shows parent-child relationships between the categories from categories.csv.
I imported the first csv file with the following query which went well and pretty quickly:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories.csv' as line
CREATE (:Category {name:line[0]})
Then I imported the second csv file with:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories_relations.csv' as line
MATCH (a:Category),(b:Category)
WHERE a.name = line[0] AND b.name = line[1]
CREATE (a)-[r:ISPARENTOF]->(b)
I have about 2 million nodes.
I tried executing the 2nd query and it is taking quite long. Can I make the query execute more quickly?
Confirm you are matching on right property. You are setting only one property for Category node i.e. name while creating
categories. But you are matching on property id in your second
query to create the relationships between categories.
For executing the 2nd query faster you can add an index on the property (here id) which you are matching Category nodes on.
CREATE INDEX ON :Category(id)
If it still takes time, You can refer my answer to Load CSV here
I have a really simple CSV that I created so I can practice loading CSVs into Neo4j.
The CSV looks like this:
boxer_id name boxer_country total_wins bdate fought fight_id fight_location outcome
1 Glass Joe France 0 1/2/80 2 100 Las Vegas L
2 Bald Bull Turkey 2 2/3/81 1 100 Macao W
3 Soda Popinski Russia 6 3/4/82 4 101 Atlantic City L
4 Sandman USA 9 4/5/83 3 101 Japan W
I want to make 2 nodes, boxer and fight.
But I'm having trouble connecting the boxers to the fights.
Here's as far as I got:
As you can see, I successfully read in the nodes, but I don't know how to create the relationship between boxers and their boxing matches.
I want to do something like:
CREATE (boxer)-[:AGAINST]->(boxer)
but this doesn't make sense. I need to use the field fought, which encapsulates the information regarding who has faced who in the ring.
Any advice would be greatly appreciated. I'm not sure how to do this in the context of READ CSV.
Here's my code:
// The goal here is to create a node called Boxer, and pull in properties.
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
WITH line, SPLIT(line.bdate, '/') AS bdate
CREATE (b:boxer {boxer_id: line.boxer_id})
SET b.byear= TOINT(bdate[2]),
b.bmonth= TOINT(bdate[0]),
b.bday = TOINT(bdate[1]),
b.name = line.name,
b.country = line.boxer_country,
b.total_wins = TOINT(line.total_wins)
// Now we make a node called Fight
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
CREATE (f:fight {fight_id: line.fight_id, fight_loc: line.fight_location})
// Now we set relationships
// ????
You could add a few lines to match the boxers you already created and create relationships between them and the newly created fight. I am thinking something along these lines might work for you...
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
MATCH (b1:boxer {boxer_id: line.boxer_id})
WITH line, b1
MATCH (b2:boxer {boxer_id: line.fought})
MERGE (f:fight {fight_id: line.fight_id})
CREATE (b1)-[:AGAINST]->(b2)
CREATE (b1)-[:FOUGHT_IN]->(f)
CREATE (b2)-[:FOUGHT_IN]->(f)
One option is to just model fights as relationships between Boxer nodes, instead of creating the Fight nodes:
LOAD CSV WITH HEADERS FROM 'file:///test.csv' AS line
MERGE (b1:Boxer {boxer_id: line.boxer_id})
MERGE (b2:Boxer {boxer_id: line.fought})
CREATE (b1)-[f:fought]->(b2)
SET f.location = line.fight_location,
f.outcome = line.outcome
However it probably makes more sense to model the fights as nodes, since they are events. In that case something like this:
LOAD CSV WITH HEADERS FROM 'file:///text.csv' AS line
MATCH (b:Boxer {boxer_id: line.boxer_id})
MERGE (f:fight {fight_id: line.fight_id})
ON CREATE SET f.location = line.fight_location
CREATE (b)-[r:FOUGHT_IN]->(f)
WITH r, CASE line.outcome WHEN "W" THEN [1] ELSE [] END AS win
FOREACH (x IN win | SET r.winner = TRUE)
Note here that we are storing the outcome of the fight as a property on the :FOUGHT_IN relationship.
Edit Updated to use MERGE to avoid creating duplicate Fight nodes. When using MERGE you should also create a uniqueness constraint: CREATE CONSTRAINT ON (f:Fight) ASSERT f.fight_id IS UNIQUE; before running the import script.