I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.
I just downloaded and installed Neo4J. Now I'm working with a simple csv that is looking like that:
So first I'm using this to merge the nodes for that file:
LOAD CSV WITH HEADERS FROM 'file:///Athletes.csv' AS line
MERGE(Rank:rank{rang: line.Rank})
MERGE(Name:name{nom: line.Name})
MERGE(Sport:sport{sport: line.Sport})
MERGE(Nation:nation{pays: line.Nation})
MERGE(Gender: gender{genre: line.Gender})
MERGE(BirthDate:birthDate{dateDeNaissance: line.BirthDate})
MERGE(BirthPlace: birthplace{lieuDeNaissance: line.BirthPlace})
MERGE(Height: height{taille: line.Height})
MERGE(Pay: pay{salaire: line.Pay})
and this to create some constraint for that file:
CREATE CONSTRAINT ON(name:Name) ASSERT name.nom IS UNIQUE
CREATE CONSTRAINT ON(rank:Rank) ASSERT rank.rang IS UNIQUE
Then I want to display to which country the athletes live to. For that I use:
Create(name)-[:WORK_AT]->(nation)
But I have have that appear:
I would like to know why I have that please.
I thank in advance anyone that takes time to help me.
Several issues come to mind:
If your CREATE clause is part of your first query: since the CREATE clause uses the variable names name and nation, and your MERGE clauses use Name and Nation (which have different casing) -- the CREATE clause would just create new nodes instead of using the Name and Nation nodes.
If your CREATE clause is NOT part of your first query: your CREATE clause would just create new nodes (since variable names, even assuming they had the same casing, are local to a query and are not stored in the DB).
Solution: You can add this clause to the end of the first query:
CREATE (Name)-[:WORK_AT]->(Nation)
Yes, Agree with #cybersam, it's the case sensitive issue of 'name' and 'nation' variables.
My suggesttion:
MERGE (Name)-[:WORK_AT]->(Nation)
I see that you're using MERGE for nodes, so just in case any values of Name or Nation duplicated, you should use MERGE instead of CREATE.
I am importing the following to Neo4J:
categories.csv
CategoryName1
CategoryName2
CategoryName3
...
categories_relations.csv
category_parent category_child
CategoryName3 CategoryName10
CategoryName32 CategoryName41
...
Basically, categories_relations.csv shows parent-child relationships between the categories from categories.csv.
I imported the first csv file with the following query which went well and pretty quickly:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories.csv' as line
CREATE (:Category {name:line[0]})
Then I imported the second csv file with:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories_relations.csv' as line
MATCH (a:Category),(b:Category)
WHERE a.name = line[0] AND b.name = line[1]
CREATE (a)-[r:ISPARENTOF]->(b)
I have about 2 million nodes.
I tried executing the 2nd query and it is taking quite long. Can I make the query execute more quickly?
Confirm you are matching on right property. You are setting only one property for Category node i.e. name while creating
categories. But you are matching on property id in your second
query to create the relationships between categories.
For executing the 2nd query faster you can add an index on the property (here id) which you are matching Category nodes on.
CREATE INDEX ON :Category(id)
If it still takes time, You can refer my answer to Load CSV here
i'm trying to solve a problem of the 1: many relationship display in neo4j. My dataset is as below
child,desc,type,parent
1,PGD,Exchange,0
2,MSE 1,MSE,1
3,MSE 2,MSE,1
4,MSE 3,MSE,1
5,MSE 4,MSE,1
6,BRAS 1,BRAS,2
6,BRAS 1,BRAS,3
7,BRAS 2,BRAS,4
7,BRAS 2,BRAS,5
10,NPE 1,NPE,6
11,NPE 2,NPE,7
12,OLT,OLT,10
12,OLT,OLT,11
13,FDC,FDC,12
14,FDP,FDP,13
15,Cust 1,Customer,14
16,Cust 2,Customer,14
17,Cust 3,Customer,14
LOAD CSV WITH HEADERS FROM 'file:///FTTH_sample.csv' AS line
CREATE(:ftthsample
{child_id:line.child,
desc:line.desc,
type:line.type,
parent_id:line.parent});
//Relations
match (child:ftthsample),(parent:ftthsample)
where child.child_id=parent.parent_id
create (child)-[:test]->(parent)
//Query:
MATCH (child)-[childrel:test*]-(elem)-[parentrel:test*]->(parent)
WHERE elem.desc='FDP'
RETURN child,childrel,elem,parentrel
It returns a display as below.
I want the duplicate nodes to be displayed as one. Newbie with Neo4J. Can anyone of the experts help please?
This seems like an error in your graph creation query. You have a few lines in your query specifying the same node multiple times, but with multiple parents:
6,BRAS 1,BRAS,2
6,BRAS 1,BRAS,3
I'm guessing you actually want this to be a single node, with parent relationships to nodes with the given parent ids, instead of two separate nodes.
Let's adjust your import query. Instead of using a CREATE on each line, we'll use MERGE, and just on the child_id, which seems to be your primary key (maybe consider just using id instead, as a node can have an id on its own, without having to consider the context of whether it's a parent or child). We can use the ON CREATE clause after MERGE to add in the remaining properties only if the MERGE resulted in node creation (instead of matching to an existing node.
That will ensure we only have one node created per child_id.
Rather than having to rematch the child, we can use the child node we just created, match on the parent, and create the relationship.
LOAD CSV WITH HEADERS FROM 'file:///FTTH_sample.csv' AS line
MERGE(child:ftthsample {child_id:line.child})
ON CREATE SET
child.desc = line.desc,
child.type = line.type
WITH child, line.parent as parentId
MATCH (parent:ftthsample)
WHERE parent.child_id = parentId
MERGE (child)-[:test]->(parent)
Note that we haven't added line.parent as a property. It's not needed, since we only use that to create relationships, and after the relationships are there, we won't need those again.
I'm very new to Neo4j, been playing around with it for a couple of days now.
I'm trying to use Neo4j to map our company's database by showing how one table is related to another (data is pulled to or pushed from one table to another) and what scripts are used to do this pulling and pushing. To do this, I'm using three different properties: TableName, ScriptName, and TableTouch.
TableName: Table node which corresponds to the name of a table
ScriptName: Script Node which corresponds to the script which
updatesa table
TableTouch: Used to show which table affects another
table
Here is an example of the .CSV I'm importing:
TableName ScriptName TableTouch
Source ScriptA Water/Oil
Water ScriptB Source
Oil ScriptC Source
Here is the code I have thus far:
CREATE CONSTRAINT ON (c:Table) ASSERT c.TableName IS UNIQUE;
CREATE CONSTRAINT ON (c:Scripts) ASSERT c.ScriptName IS UNIQUE;
LOAD CSV WITH HEADERS FROM
"file:///C:\\NeoTest.CSV" AS line
MERGE (table:Table {TableName: UPPER(line.TableName)})
SET table.TableTouch = UPPER(line.TableTouch)
MERGE (script:Scripts {ScriptName: UPPER(line.ScriptName)})
MERGE (table) - [:UPDATED_BY] -> (script)
This will relate scripts to their appropriate tables and load in all the table and script nodes.
Now an example of what I need is for Node "Source" to connect to Node "Water" because "Source" = Water.TableTouch and "Water" = Source.TableTouch.
Assume any given table could have multiple tables listed in the TableTouch property.
I want the TableName nodes to connect to other TableName nodes where the TableName of one node is found in the TableName.TableTouch of another node. How would I go about doing this? Do I need to have my .CSV formatted differently for this?
Thanks,
-Andrew
Edit: This may make things more clear
What I have:
What I'd like to have (red arrows):
[UPDATED]
If I understand your scenario, you want to represent the Script that is used to generate each Table, and what other table was used by that Script.
And, if I understand the meaning of your CSV file and your pictures, it looks like the Source table is generated by ScriptA without using data from any other tables. If so, you can create your CSV file to look something like this (where the Source table row's TableTouch column has the special value NOTHING -- you can use some other name -- to indicate that column actually has no value):
TableName,ScriptName,TableTouch
Source,ScriptA,NOTHING
Water,ScriptB,Source
Oil,ScriptC,Source
Data model:
(src:Table {name: 'Source'})<-[:USES]-(s:Script {name: 'ScriptC'})-[:MAKES]->(dest:Table {name: 'Oil'})
Note: This data model allows a single Script to "use" any number of source Tables and "make" any number of destination Tables.
Create Constraints
CREATE CONSTRAINT ON (t:Table) ASSERT t.name IS UNIQUE;
CREATE CONSTRAINT ON (s:Script) ASSERT s.name IS UNIQUE;
Import data
LOAD CSV WITH HEADERS FROM "file:///C:\\NeoTest.CSV" AS line
MERGE (src:Table {name: line.TableTouch})
MERGE (dest:Table {name: line.TableName})
MERGE (script:Script {name: line.ScriptName})
MERGE (script)-[:USES]->(src)
MERGE (script)-[:MAKES]->(dest)
Note: To keep the query simple, we just go ahead and create (at most) one NOTHING node to represent the absence of a source Table.
Results