I have a CSV file, which have 11 columns : Rank, Year, Name...
It contains the best video games sales. I'm new to neo4j and cypher.
I am trying to import it to neo4j with this cypher query :
LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line CREATE (:Vgsales {rank: toInteger(line.Rank), name: line.Name, platform: line.Platform, year: toInteger(line.Year), genre: line.Genre, publisher: line.Publisher, NA_sales: toInteger(line.NA_Sales), EU_sales: toInteger(line.EU_Sales)], JP_sales: toInteger(line.JP_Sales), Other_sales: toInteger(line.Other_Sales), Global_sales: toInteger(line.Global_Sales)})
When i do this, I have the nodes, but there is no reationships between them, and I need to give the graph model with this query : call db.schema.visualization but there is only one empty node when I do this.
I don't understand why there isn't any relationships .
There is a syntax error on your script. You can remove this line below:
WITH v
I tried it on my neo4j browser and it works well:
LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line
CREATE(v:Vgsales {rank: toInteger(line.Rank), name: line.Name, platform: line.Platform, year:toInteger(line.Year)})
//WITH v <- remove this!
MERGE (g:GENRE {genre: line.Genre})
MERGE (p:PUBLISHER {publisher: line.Publisher, NA_sales: toInteger(line.NA_Sales),
EU_sales: toInteger(line.EU_Sales), JP_sales: toInteger(line.JP_Sales), Other_sales: toInteger(line.Other_Sales), Global_sales: toInteger(line.Global_Sales)})
MERGE (v)-[:IN_GENRE]->(g)
MERGE (p)-[:PUBLISHED]->(v)
Result: Added 3 labels, created 3 nodes, set 11 properties, created 2 relationships, completed after 235 ms.
Thanks for your answer.
I'm posting back the query because there was an ']' that i forgot to remove :
LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line CREATE(v:Vgsales {rank: toInteger(line.Rank), name: line.Name, platform: line.Platform, year:toInteger(line.Year)}) WITH v MERGE (g:GENRE {genre: line.Genre}) MERGE (p:PUBLISHER {publisher: line.Publisher, NA_sales: toInteger(line.NA_Sales), EU_sales: toInteger(line.EU_Sales), JP_sales: toInteger(line.JP_Sales), Other_sales: toInteger(line.Other_Sales), Global_sales: toInteger(line.Global_Sales)}) MERGE (v)-[:IN_GENRE]->(g) MERGE (p)-[:PUBLISHED]->(v)
However the query still doesn't work. I have this error : enter image description here
This is how my dataset looks like : enter image description here
The exercise that I must do for tomorrow is to find a dataset, to find a problematic and to answer it with a plugin algorithm and then get the graph model and load the csv file in neo4j but i don't know how should add the relationships between the nodes.
From your code it seems like you have not created any relationship just a single node for each row in your csv.
My suggesion is try to create a model first. you can use arrows.app to try and describe your model.
Relationships are created by joining two nodes lets say
CREATE (:PERSON {name:"CHARLIE")-[:FOLLOWS]->(:PERSON {name:"JOHN"})
And from your code id probably try something like
LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line CREATE(v:Vgsales {rank: toInteger(line.Rank), name: line.Name, platform: line.Platform, year: toInteger(line.Year)}) WITH v MERGE (g:GENRE {genre: line.Genre}) MERGE (p:PUBLISHER {publisher: line.Publisher, NA_sales: toInteger(line.NA_Sales), EU_sales: toInteger(line.EU_Sales)], JP_sales: toInteger(line.JP_Sales), Other_sales: toInteger(line.Other_Sales), Global_sales: toInteger(line.Global_Sales)}) MERGE (v)-[:IN_GENRE]->(g) MERGE (p)-[:PUBLISHED]->(v)
I'm not sure that fits your model though you could try and draw your model and perhaps i will write a better code.
Related
I've got two csv files Job (30,000 entries) and Cat (30 entries) imported into neo4j and am trying to create a relationship between them
Each Job has a cat_ID and Cat contains the category name and ID
after executing the following
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MATCH (job:Job {cat_ID: row.cat_ID})
MATCH (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
it returns (no changes, no records)
I received a prompt recommending that I index the category and so using
Create INDEX ON :Job(cat_id); I did, but I still get the same error
How do I create a relationship between the two?
I am able to get this to work on a smaller dataset
You are probably trying to match on non-existing nodes. Try
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MERGE (job:Job {cat_ID: row.cat_ID})
MERGE (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
Have a look in your logs and see if you are running out of memory.
You could try chunking the data set up into smaller pieces with Periodic Commit and see if that helps:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MATCH (job:Job {cat_ID: row.cat_ID})
MATCH (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
I am importing the following to Neo4J:
categories.csv
CategoryName1
CategoryName2
CategoryName3
...
categories_relations.csv
category_parent category_child
CategoryName3 CategoryName10
CategoryName32 CategoryName41
...
Basically, categories_relations.csv shows parent-child relationships between the categories from categories.csv.
I imported the first csv file with the following query which went well and pretty quickly:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories.csv' as line
CREATE (:Category {name:line[0]})
Then I imported the second csv file with:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories_relations.csv' as line
MATCH (a:Category),(b:Category)
WHERE a.name = line[0] AND b.name = line[1]
CREATE (a)-[r:ISPARENTOF]->(b)
I have about 2 million nodes.
I tried executing the 2nd query and it is taking quite long. Can I make the query execute more quickly?
Confirm you are matching on right property. You are setting only one property for Category node i.e. name while creating
categories. But you are matching on property id in your second
query to create the relationships between categories.
For executing the 2nd query faster you can add an index on the property (here id) which you are matching Category nodes on.
CREATE INDEX ON :Category(id)
If it still takes time, You can refer my answer to Load CSV here
So I've been trying to load a csv file where participants have had to rate whom they will get advice from/talk to when they have problems with studying. the table looks something like this:
The alphabets are just names of the people. As you can see there are nulls in this table. I'm trying to load this into Neo4j so we can visualise who is choosing who and if this relationship is reciprocal. Any idea? All help is much appreciated!
Using IS NOT NULL can solve your problem.
LOAD CSV WITH HEADERS FROM file:///xyz.csv AS line
WITH line LIMIT 10
RETURN line
Using this you can see how your data is being loaded.(Don't forget to use limit). Since all the values loaded from CSV are in string format, you'll get your empty column values as this -> "".
From that you can create your node by following the blog i've referenced. And also using IS NOT NULL you can skip the null values and create your schema.
Example:
MERGE (n:Person{name:line.Person})-[:CHOSE]-(:Study1{name:line[1]})
MERGE (n)-[:CHOSE]-(:Study2{name:line[2]})
MERGE (n)-[:CHOSE]-(:Study3{name:line[3]})
MERGE (n)-[:CHOSE]-(:Study4{name:line[4]})
MERGE (n)-[:CHOSE]-(:Study5{name:line[5]})
OR you can use
WITH Line[1] as Person, Line[2] as Study1 and so on...
WHERE Study5 IS NOT NULL
MERGE (n:Person{name:line.Person})-[:CHOSE]-(:Study1{name:line[1]})
MERGE (n)-[:CHOSE]-(:Study2{name:line[2]})
MERGE (n)-[:CHOSE]-(:Study3{name:line[3]})
MERGE (n)-[:CHOSE]-(:Study4{name:line[4]})
MERGE (n)-[:CHOSE]-(:Study5{name:line[5]})
For more detail go through this example.
Hope this helps!
I'm very new to Neo4j, been playing around with it for a couple of days now.
I'm trying to use Neo4j to map our company's database by showing how one table is related to another (data is pulled to or pushed from one table to another) and what scripts are used to do this pulling and pushing. To do this, I'm using three different properties: TableName, ScriptName, and TableTouch.
TableName: Table node which corresponds to the name of a table
ScriptName: Script Node which corresponds to the script which
updatesa table
TableTouch: Used to show which table affects another
table
Here is an example of the .CSV I'm importing:
TableName ScriptName TableTouch
Source ScriptA Water/Oil
Water ScriptB Source
Oil ScriptC Source
Here is the code I have thus far:
CREATE CONSTRAINT ON (c:Table) ASSERT c.TableName IS UNIQUE;
CREATE CONSTRAINT ON (c:Scripts) ASSERT c.ScriptName IS UNIQUE;
LOAD CSV WITH HEADERS FROM
"file:///C:\\NeoTest.CSV" AS line
MERGE (table:Table {TableName: UPPER(line.TableName)})
SET table.TableTouch = UPPER(line.TableTouch)
MERGE (script:Scripts {ScriptName: UPPER(line.ScriptName)})
MERGE (table) - [:UPDATED_BY] -> (script)
This will relate scripts to their appropriate tables and load in all the table and script nodes.
Now an example of what I need is for Node "Source" to connect to Node "Water" because "Source" = Water.TableTouch and "Water" = Source.TableTouch.
Assume any given table could have multiple tables listed in the TableTouch property.
I want the TableName nodes to connect to other TableName nodes where the TableName of one node is found in the TableName.TableTouch of another node. How would I go about doing this? Do I need to have my .CSV formatted differently for this?
Thanks,
-Andrew
Edit: This may make things more clear
What I have:
What I'd like to have (red arrows):
[UPDATED]
If I understand your scenario, you want to represent the Script that is used to generate each Table, and what other table was used by that Script.
And, if I understand the meaning of your CSV file and your pictures, it looks like the Source table is generated by ScriptA without using data from any other tables. If so, you can create your CSV file to look something like this (where the Source table row's TableTouch column has the special value NOTHING -- you can use some other name -- to indicate that column actually has no value):
TableName,ScriptName,TableTouch
Source,ScriptA,NOTHING
Water,ScriptB,Source
Oil,ScriptC,Source
Data model:
(src:Table {name: 'Source'})<-[:USES]-(s:Script {name: 'ScriptC'})-[:MAKES]->(dest:Table {name: 'Oil'})
Note: This data model allows a single Script to "use" any number of source Tables and "make" any number of destination Tables.
Create Constraints
CREATE CONSTRAINT ON (t:Table) ASSERT t.name IS UNIQUE;
CREATE CONSTRAINT ON (s:Script) ASSERT s.name IS UNIQUE;
Import data
LOAD CSV WITH HEADERS FROM "file:///C:\\NeoTest.CSV" AS line
MERGE (src:Table {name: line.TableTouch})
MERGE (dest:Table {name: line.TableName})
MERGE (script:Script {name: line.ScriptName})
MERGE (script)-[:USES]->(src)
MERGE (script)-[:MAKES]->(dest)
Note: To keep the query simple, we just go ahead and create (at most) one NOTHING node to represent the absence of a source Table.
Results
I have a really simple CSV that I created so I can practice loading CSVs into Neo4j.
The CSV looks like this:
boxer_id name boxer_country total_wins bdate fought fight_id fight_location outcome
1 Glass Joe France 0 1/2/80 2 100 Las Vegas L
2 Bald Bull Turkey 2 2/3/81 1 100 Macao W
3 Soda Popinski Russia 6 3/4/82 4 101 Atlantic City L
4 Sandman USA 9 4/5/83 3 101 Japan W
I want to make 2 nodes, boxer and fight.
But I'm having trouble connecting the boxers to the fights.
Here's as far as I got:
As you can see, I successfully read in the nodes, but I don't know how to create the relationship between boxers and their boxing matches.
I want to do something like:
CREATE (boxer)-[:AGAINST]->(boxer)
but this doesn't make sense. I need to use the field fought, which encapsulates the information regarding who has faced who in the ring.
Any advice would be greatly appreciated. I'm not sure how to do this in the context of READ CSV.
Here's my code:
// The goal here is to create a node called Boxer, and pull in properties.
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
WITH line, SPLIT(line.bdate, '/') AS bdate
CREATE (b:boxer {boxer_id: line.boxer_id})
SET b.byear= TOINT(bdate[2]),
b.bmonth= TOINT(bdate[0]),
b.bday = TOINT(bdate[1]),
b.name = line.name,
b.country = line.boxer_country,
b.total_wins = TOINT(line.total_wins)
// Now we make a node called Fight
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
CREATE (f:fight {fight_id: line.fight_id, fight_loc: line.fight_location})
// Now we set relationships
// ????
You could add a few lines to match the boxers you already created and create relationships between them and the newly created fight. I am thinking something along these lines might work for you...
LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
MATCH (b1:boxer {boxer_id: line.boxer_id})
WITH line, b1
MATCH (b2:boxer {boxer_id: line.fought})
MERGE (f:fight {fight_id: line.fight_id})
CREATE (b1)-[:AGAINST]->(b2)
CREATE (b1)-[:FOUGHT_IN]->(f)
CREATE (b2)-[:FOUGHT_IN]->(f)
One option is to just model fights as relationships between Boxer nodes, instead of creating the Fight nodes:
LOAD CSV WITH HEADERS FROM 'file:///test.csv' AS line
MERGE (b1:Boxer {boxer_id: line.boxer_id})
MERGE (b2:Boxer {boxer_id: line.fought})
CREATE (b1)-[f:fought]->(b2)
SET f.location = line.fight_location,
f.outcome = line.outcome
However it probably makes more sense to model the fights as nodes, since they are events. In that case something like this:
LOAD CSV WITH HEADERS FROM 'file:///text.csv' AS line
MATCH (b:Boxer {boxer_id: line.boxer_id})
MERGE (f:fight {fight_id: line.fight_id})
ON CREATE SET f.location = line.fight_location
CREATE (b)-[r:FOUGHT_IN]->(f)
WITH r, CASE line.outcome WHEN "W" THEN [1] ELSE [] END AS win
FOREACH (x IN win | SET r.winner = TRUE)
Note here that we are storing the outcome of the fight as a property on the :FOUGHT_IN relationship.
Edit Updated to use MERGE to avoid creating duplicate Fight nodes. When using MERGE you should also create a uniqueness constraint: CREATE CONSTRAINT ON (f:Fight) ASSERT f.fight_id IS UNIQUE; before running the import script.