So I've been trying to load a csv file where participants have had to rate whom they will get advice from/talk to when they have problems with studying. the table looks something like this:
The alphabets are just names of the people. As you can see there are nulls in this table. I'm trying to load this into Neo4j so we can visualise who is choosing who and if this relationship is reciprocal. Any idea? All help is much appreciated!
Using IS NOT NULL can solve your problem.
LOAD CSV WITH HEADERS FROM file:///xyz.csv AS line
WITH line LIMIT 10
RETURN line
Using this you can see how your data is being loaded.(Don't forget to use limit). Since all the values loaded from CSV are in string format, you'll get your empty column values as this -> "".
From that you can create your node by following the blog i've referenced. And also using IS NOT NULL you can skip the null values and create your schema.
Example:
MERGE (n:Person{name:line.Person})-[:CHOSE]-(:Study1{name:line[1]})
MERGE (n)-[:CHOSE]-(:Study2{name:line[2]})
MERGE (n)-[:CHOSE]-(:Study3{name:line[3]})
MERGE (n)-[:CHOSE]-(:Study4{name:line[4]})
MERGE (n)-[:CHOSE]-(:Study5{name:line[5]})
OR you can use
WITH Line[1] as Person, Line[2] as Study1 and so on...
WHERE Study5 IS NOT NULL
MERGE (n:Person{name:line.Person})-[:CHOSE]-(:Study1{name:line[1]})
MERGE (n)-[:CHOSE]-(:Study2{name:line[2]})
MERGE (n)-[:CHOSE]-(:Study3{name:line[3]})
MERGE (n)-[:CHOSE]-(:Study4{name:line[4]})
MERGE (n)-[:CHOSE]-(:Study5{name:line[5]})
For more detail go through this example.
Hope this helps!
Related
I have a CSV file, which have 11 columns : Rank, Year, Name...
It contains the best video games sales. I'm new to neo4j and cypher.
I am trying to import it to neo4j with this cypher query :
LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line CREATE (:Vgsales {rank: toInteger(line.Rank), name: line.Name, platform: line.Platform, year: toInteger(line.Year), genre: line.Genre, publisher: line.Publisher, NA_sales: toInteger(line.NA_Sales), EU_sales: toInteger(line.EU_Sales)], JP_sales: toInteger(line.JP_Sales), Other_sales: toInteger(line.Other_Sales), Global_sales: toInteger(line.Global_Sales)})
When i do this, I have the nodes, but there is no reationships between them, and I need to give the graph model with this query : call db.schema.visualization but there is only one empty node when I do this.
I don't understand why there isn't any relationships .
There is a syntax error on your script. You can remove this line below:
WITH v
I tried it on my neo4j browser and it works well:
LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line
CREATE(v:Vgsales {rank: toInteger(line.Rank), name: line.Name, platform: line.Platform, year:toInteger(line.Year)})
//WITH v <- remove this!
MERGE (g:GENRE {genre: line.Genre})
MERGE (p:PUBLISHER {publisher: line.Publisher, NA_sales: toInteger(line.NA_Sales),
EU_sales: toInteger(line.EU_Sales), JP_sales: toInteger(line.JP_Sales), Other_sales: toInteger(line.Other_Sales), Global_sales: toInteger(line.Global_Sales)})
MERGE (v)-[:IN_GENRE]->(g)
MERGE (p)-[:PUBLISHED]->(v)
Result: Added 3 labels, created 3 nodes, set 11 properties, created 2 relationships, completed after 235 ms.
Thanks for your answer.
I'm posting back the query because there was an ']' that i forgot to remove :
LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line CREATE(v:Vgsales {rank: toInteger(line.Rank), name: line.Name, platform: line.Platform, year:toInteger(line.Year)}) WITH v MERGE (g:GENRE {genre: line.Genre}) MERGE (p:PUBLISHER {publisher: line.Publisher, NA_sales: toInteger(line.NA_Sales), EU_sales: toInteger(line.EU_Sales), JP_sales: toInteger(line.JP_Sales), Other_sales: toInteger(line.Other_Sales), Global_sales: toInteger(line.Global_Sales)}) MERGE (v)-[:IN_GENRE]->(g) MERGE (p)-[:PUBLISHED]->(v)
However the query still doesn't work. I have this error : enter image description here
This is how my dataset looks like : enter image description here
The exercise that I must do for tomorrow is to find a dataset, to find a problematic and to answer it with a plugin algorithm and then get the graph model and load the csv file in neo4j but i don't know how should add the relationships between the nodes.
From your code it seems like you have not created any relationship just a single node for each row in your csv.
My suggesion is try to create a model first. you can use arrows.app to try and describe your model.
Relationships are created by joining two nodes lets say
CREATE (:PERSON {name:"CHARLIE")-[:FOLLOWS]->(:PERSON {name:"JOHN"})
And from your code id probably try something like
LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line CREATE(v:Vgsales {rank: toInteger(line.Rank), name: line.Name, platform: line.Platform, year: toInteger(line.Year)}) WITH v MERGE (g:GENRE {genre: line.Genre}) MERGE (p:PUBLISHER {publisher: line.Publisher, NA_sales: toInteger(line.NA_Sales), EU_sales: toInteger(line.EU_Sales)], JP_sales: toInteger(line.JP_Sales), Other_sales: toInteger(line.Other_Sales), Global_sales: toInteger(line.Global_Sales)}) MERGE (v)-[:IN_GENRE]->(g) MERGE (p)-[:PUBLISHED]->(v)
I'm not sure that fits your model though you could try and draw your model and perhaps i will write a better code.
i m new at neo4j and i d like to upload a csv file and create a set of nodes. However i have already some existing nodes that may exist on that csv file. Is there an option to load the csv, create the nodes based on each row and in case the node already exists skip that row?
Thanks
You can use the MERGE clause to avoid creating duplicate nodes and relationships.
However, you need to carefully read the documentation to understand how to use MERGE, as incorrect usage can cause the unintentional creation of nodes and relationships.
Merge will give you what you want, however you must be careful how you identify the record uniquely to prevent creating duplicates
I'll put the desired final form first as attention spans seem to be on the decline...
// This one is safe assuming name is a true unique identifier of your Friends
// and that their favorite colors and foods may change over time
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0]})
set a.favorite_food = line[1]
set a.favorite_color = line[2]
The merge above will create or find the Friend node with that matching name and then, regardless of whether we are creating it or updating it, set the attributes on it.
If we were to instead provide all the attributes in the merge as such:
// This one is dangerous - all attributes must match in order
// to find the existing Friend node
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0], favorite_food: line[1], favorite_color: line[2]})
Then we would fail to find an existing friend everytime their favorite_food or favorite_color was updated in our data being (re)loaded.
Here's an example for anyone who's imagination hasn't fully filled in the blanks...
//Last month's file contained:
Bob Marley,Hemp Seeds,Green
//This month's file contained:
Bob Marley,Soylent Green,Rainbow
I just downloaded and installed Neo4J. Now I'm working with a simple csv that is looking like that:
So first I'm using this to merge the nodes for that file:
LOAD CSV WITH HEADERS FROM 'file:///Athletes.csv' AS line
MERGE(Rank:rank{rang: line.Rank})
MERGE(Name:name{nom: line.Name})
MERGE(Sport:sport{sport: line.Sport})
MERGE(Nation:nation{pays: line.Nation})
MERGE(Gender: gender{genre: line.Gender})
MERGE(BirthDate:birthDate{dateDeNaissance: line.BirthDate})
MERGE(BirthPlace: birthplace{lieuDeNaissance: line.BirthPlace})
MERGE(Height: height{taille: line.Height})
MERGE(Pay: pay{salaire: line.Pay})
and this to create some constraint for that file:
CREATE CONSTRAINT ON(name:Name) ASSERT name.nom IS UNIQUE
CREATE CONSTRAINT ON(rank:Rank) ASSERT rank.rang IS UNIQUE
Then I want to display to which country the athletes live to. For that I use:
Create(name)-[:WORK_AT]->(nation)
But I have have that appear:
I would like to know why I have that please.
I thank in advance anyone that takes time to help me.
Several issues come to mind:
If your CREATE clause is part of your first query: since the CREATE clause uses the variable names name and nation, and your MERGE clauses use Name and Nation (which have different casing) -- the CREATE clause would just create new nodes instead of using the Name and Nation nodes.
If your CREATE clause is NOT part of your first query: your CREATE clause would just create new nodes (since variable names, even assuming they had the same casing, are local to a query and are not stored in the DB).
Solution: You can add this clause to the end of the first query:
CREATE (Name)-[:WORK_AT]->(Nation)
Yes, Agree with #cybersam, it's the case sensitive issue of 'name' and 'nation' variables.
My suggesttion:
MERGE (Name)-[:WORK_AT]->(Nation)
I see that you're using MERGE for nodes, so just in case any values of Name or Nation duplicated, you should use MERGE instead of CREATE.
I have a column in a csv that looks like this:
I am using this code to test how the splitting of the dates is working:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth
return date_of_birth;
This code block works fine and gives me what I'd expect, which is a collection of three values for each date, or perhaps a null if there was no date ( e.g,
[4, 5, 1971]
[0, 0, 2003]
[0, 0, 2005]
. . .
null
null
. . .
My question is, what is this problem with the nulls that are created, and why can't I do a MERGE when there are nulls?
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth, line
MERGE (p:Person {
date_of_birth: date_of_birth
});
This block above gives me the error:
Cannot merge node using null property value for date_of_birth
I have searched around and have only found one other SO question about this error, which has no answer. Other searches didn't help.
I was under the impression that if there isn't a value, then Neo4j simply doesn't create the element.
I figured maybe the node can't be generated since, after all, how can a node be generated if there is no value to generate it from? So, since I know there are no ID's missing, maybe I could MERGE with ID and date, so Neo4j always sees a value.
But this code didn't fare any better (same error message):
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH
SPLIT(line.date_of_birth, '/') AS date_of_birth, line
MERGE (p:Person {
ID: line.ID
,date_of_birth: date_of_birth
});
My next idea is that maybe this error is because I'm trying to split a null value on slashes? Maybe the whole issue is due to the SPLIT.
But alas, same error when simplified to this:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
WITH line
MERGE (p:Person {
subject_person_id: line.subject_person_id
,date_of_birth: line.date_of_birth
});
So I don't really understand the cause of the error. Thanks for looking at this.
EDIT
Both #stdob-- and #cybersam have both answered with equally excellent responses, if you came here via Google please consider them as if both were accepted
As #cybersam said merge not work well with queries where the properties are set within the scope in null. So, you can use on create and on match:
LOAD CSV WITH HEADERS FROM
'file:///..some_csv.csv' AS line
MERGE (p:Person {
subject_person_id: line.subject_person_id
})
ON CREATE SET p.date_of_birth = line.date_of_birth
ON MATCH SET p.date_of_birth = line.date_of_birth
Some Cypher queries, like MERGE, do not work well with NULL values.
The somewhat tricky workaround for handling this situation with MERGE is to use the FOREACH clause to conditionally perform the MERGE. This query might work for you:
LOAD CSV WITH HEADERS FROM 'file:///..some_csv.csv' AS line
FOREACH (x IN CASE WHEN line.date_of_birth IS NULL THEN [] ELSE [1] END |
MERGE (:Person {date_of_birth: SPLIT(line.date_of_birth, '/')})
);
Another solution that I've been rather fond of is to just tell cypher to skip rows in which the field of interest is NULL as follows:
USING PERIODIC COMMIT #
LOAD CSV WITH HEADERS FROM
'file:///.../csv.csv' AS line
WITH line, SPLIT(line.somedatefield, delimiter) AS date
WHERE NOT line.somedatefield IS NULL
[THE REST OF YOUR QUERY INVOLVING THE FIELD]
Or you can use COALESCE(n.property?, {defaultValue})
Following with Vojtech Ruzicka's approach, you can use something like this your_value:COALESCE(line.your_value, 'default value')
Link to the documentation here, in case you need more information.
I'm very new to Neo4j, been playing around with it for a couple of days now.
I'm trying to use Neo4j to map our company's database by showing how one table is related to another (data is pulled to or pushed from one table to another) and what scripts are used to do this pulling and pushing. To do this, I'm using three different properties: TableName, ScriptName, and TableTouch.
TableName: Table node which corresponds to the name of a table
ScriptName: Script Node which corresponds to the script which
updatesa table
TableTouch: Used to show which table affects another
table
Here is an example of the .CSV I'm importing:
TableName ScriptName TableTouch
Source ScriptA Water/Oil
Water ScriptB Source
Oil ScriptC Source
Here is the code I have thus far:
CREATE CONSTRAINT ON (c:Table) ASSERT c.TableName IS UNIQUE;
CREATE CONSTRAINT ON (c:Scripts) ASSERT c.ScriptName IS UNIQUE;
LOAD CSV WITH HEADERS FROM
"file:///C:\\NeoTest.CSV" AS line
MERGE (table:Table {TableName: UPPER(line.TableName)})
SET table.TableTouch = UPPER(line.TableTouch)
MERGE (script:Scripts {ScriptName: UPPER(line.ScriptName)})
MERGE (table) - [:UPDATED_BY] -> (script)
This will relate scripts to their appropriate tables and load in all the table and script nodes.
Now an example of what I need is for Node "Source" to connect to Node "Water" because "Source" = Water.TableTouch and "Water" = Source.TableTouch.
Assume any given table could have multiple tables listed in the TableTouch property.
I want the TableName nodes to connect to other TableName nodes where the TableName of one node is found in the TableName.TableTouch of another node. How would I go about doing this? Do I need to have my .CSV formatted differently for this?
Thanks,
-Andrew
Edit: This may make things more clear
What I have:
What I'd like to have (red arrows):
[UPDATED]
If I understand your scenario, you want to represent the Script that is used to generate each Table, and what other table was used by that Script.
And, if I understand the meaning of your CSV file and your pictures, it looks like the Source table is generated by ScriptA without using data from any other tables. If so, you can create your CSV file to look something like this (where the Source table row's TableTouch column has the special value NOTHING -- you can use some other name -- to indicate that column actually has no value):
TableName,ScriptName,TableTouch
Source,ScriptA,NOTHING
Water,ScriptB,Source
Oil,ScriptC,Source
Data model:
(src:Table {name: 'Source'})<-[:USES]-(s:Script {name: 'ScriptC'})-[:MAKES]->(dest:Table {name: 'Oil'})
Note: This data model allows a single Script to "use" any number of source Tables and "make" any number of destination Tables.
Create Constraints
CREATE CONSTRAINT ON (t:Table) ASSERT t.name IS UNIQUE;
CREATE CONSTRAINT ON (s:Script) ASSERT s.name IS UNIQUE;
Import data
LOAD CSV WITH HEADERS FROM "file:///C:\\NeoTest.CSV" AS line
MERGE (src:Table {name: line.TableTouch})
MERGE (dest:Table {name: line.TableName})
MERGE (script:Script {name: line.ScriptName})
MERGE (script)-[:USES]->(src)
MERGE (script)-[:MAKES]->(dest)
Note: To keep the query simple, we just go ahead and create (at most) one NOTHING node to represent the absence of a source Table.
Results