Import data from 2 csv files in neo4j - neo4j

As a continue to this post in which I completely explained what I'm supposed to do, in case my central node is located in another .csv file, how can I import it in my graph?
The content of names.csv (2 columns: Lname & Fname):
Lname,Fname
Brown,Helen
Right,Eliza
Green,Helen
Pink,Kate
Yellow,Helen
The content of central.csv (2 columns: central & value):
central,value
cent1,10
I tried something like this:
LOAD CSV WITH HEADERS FROM 'file:///central.csv' AS frow
MERGE (c:center {name: frow.central})
WITH *
LOAD CSV WITH HEADERS FROM 'file:///names.csv' AS srow
WITH srow.Fname AS first, srow.Lname AS last
MERGE (p:la {last: last})
MERGE (o:fi {first: first})
MERGE (c)-[r:CONTAINS {first:first}]->(o)
MERGE (o)-[rel:CONTAINS {first: first}]->(p)
RETURN count(o)
but it didn't work for me. It created the central node for me, but my central node is not connected to first nodes as it was supposed to. What's wrong. I wanted it to be like this:

Your second WITH clause does not contain c, so c becomes an unbound variable after that clause.
Change this:
WITH srow.Fname AS first, srow.Lname AS last
to this:
WITH c, srow.Fname AS first, srow.Lname AS last
By the way, your query will only work as you expect if central.csv contains just one data row (as it does currently).

Related

Get the leaves of every node in Node4J

So For example take this data which is stored in csv file:
source,child
A,B
B,C
C,D
X,Y
Y,Z
And i load it like this:
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS line
MERGE (s:src {id: line.source})
MERGE (d:dst {id: line.child})
CREATE (s)-[:FEEDs_INTO]->(d)
In my example we have 2 leaves - A and X, but there may be multiple leaves for one node. Now I want to get every leaf-node connection. So for my example I want something like this:
A,B
A,C
A,D
X,Y
X,Z
How Can i do it ?
Your data model can be improved further. Since src and dst are connected, then you can label them in one class (let say, label it as "node"). Thus your loading script can be as follows:
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS line
MERGE (s:node {id: line.source})
MERGE (d:node {id: line.child})
MERGE (s)-[:FEEDs_INTO]->(d)
Then your query will be as simple as below:
MATCH (child:node)-[:FEEDs_INTO*]->(parent:node)
WHERE NOT EXISTS((:node)-->(child))
RETURN child, parent
where the * in the relationship means the path may vary from 1 to a length of maximum x. This is like you want to jump from A to B (length: 1) to C (length: 2) then to D (length: 3) and so on. The where clause makes sure that the child is a leaf without a node attached to it.
Result:
╒══════════╤══════════╕
│"child" │"parent" │
╞══════════╪══════════╡
│{"id":"A"}│{"id":"B"}│
├──────────┼──────────┤
│{"id":"A"}│{"id":"C"}│
├──────────┼──────────┤
│{"id":"A"}│{"id":"D"}│
├──────────┼──────────┤
│{"id":"X"}│{"id":"Y"}│
├──────────┼──────────┤
│{"id":"X"}│{"id":"Z"}│
└──────────┴──────────┘

Efficient way to import multiple csv's in neo4j

I am working on creating a graph database in neo4j for a CALL dataset. The dataset is stored in csv file with following columns: Source, Target, Timestamp, Duration. Here Source and Target are Person id's (numeric), Timestamp is datetime and duration is in seconds (integer).
I modeled my graph where person are nodes(person_id as property) and call as relationship (time and duration as property).
There are around 2,00,000 nodes and around 70 million relationships. I have a separate csv files with person id's which I used to create the nodes. I also added uniqueness constraint on the Person id's.
CREATE CONSTRAINT ON ( person:Person ) ASSERT (person.pid) IS UNIQUE
I didn't completely understand the working of bulk import so I wrote a python script to split my csv into 70 csv's where each csv has 1 million nodes (saved as calls_0, calls_1, .... calls_69). I took the initiative to manually run a cypher query changing the filename every time. It worked well(fast enough) for first few(around 10) files but then I noticed that after adding relationship from a file, the import is getting slower for the next file. Now it is taking almost 25 minutes for importing a file.
Can someone link me to an efficient and easy way of doing it?
Here is the cypher query:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///calls/calls_28.csv' AS line
WITH toInteger(line.Source) AS Source,
datetime(replace(line.Time,' ','T')) AS time,
toInteger(line.Target) AS Target,
toInteger(line.Duration) AS Duration
MATCH (p1:Person {pid: Source})
MATCH (p2:Person {pid: Target})
MERGE (p1)-[rel:CALLS {time: time, duration: Duration}]->(p2)
RETURN count(rel)
I am using Neo4j 4.0.3
Your MERGE clause has to check for an existing matching relationship (to avoid creating duplicates). If you added a lot of relationships between Person nodes, that could make the MERGE clause slower.
You should consider whether it is safe for you to use CREATE instead of MERGE.
Is much better if you export the match using the ID of each node and then create the relationship.
POC
CREATE INDEX ON :Person(`pid`);
CALL apoc.export.csv.query("LOAD CSV WITH HEADERS FROM 'file:///calls/calls_28.csv' AS line
WITH toInteger(line.Source) AS Source,
datetime(replace(line.Time,' ','T')) AS time,
toInteger(line.Target) AS Target,
toInteger(line.Duration) AS Duration
MATCH (p1:Person {pid: Source})
MATCH (p2:Person {pid: Target})
RETURN ID(a) AS ida,ID(b) as idb,time,Duration","rels.csv", {});
and then
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:////rels.csv' AS row
MATCH (a:Person) WHERE ID(a) = toInt(row.ida)
MATCH (b:Person) WHERE ID(b) = toInt(row.idb)
MERGE (b)-[:CALLS {time: row.time, duration: Duration}]->(a);
For me this is the best way to do this.

several relationships from a node to another node

I'm new to neo4j. I have a .csv file with two columns separated by ",". The first column contains first names and the 2nd column contains the last names:
Lname,Fname
Brown,Helen
Right,Eliza
Green,Helen
Pink,Kate
Yellow,Helen
I want to create nodes for Lname column and nodes for Fname column. For the rows that have the same Fname, I want to connect Lname to the corresponding Fname. For example I want to have a "Helen" node that three nodes "Brown", "Green" and "Yellow" connected to "Helen". I also want to connect "Fname" nodes to a "central node". I have written this code:
LOAD CSV WITH HEADERS FROM 'file:///names.csv' AS row
WITH row.Fname AS first, row.Lname AS last
MERGE (p:la {last: last})
MERGE (o:fi {first: first})
MERGE (c:central {name: "central node"})
MERGE (c)-[r:CONTAINS {first:first}]->(o)-[rel:CONTAINS {first: first}]->(p)
RETURN count(o)
when I run this code and display the output using this query:
MATCH (c:central)-[r:CONTAINS]->(o:fi)-[rel:CONTAINS]->(p:la)
RETURN c, r, o, rel, p
I receive this graph as output:
As you see according to the number of last names, I have the same number of relationships to the first names, For example I have 3 relationships from "central node" to "Helen", but I want only one relationship from "central node" to "Helen". What's wrong here?
The answer lies in your final MERGE clause.
MERGE (c)-[r:CONTAINS {first:first}]->(o)-[rel:CONTAINS {first: first}]->(p)
Neo4j will take this entire pattern and ensure it is unique. Since each time it is invoked (due to the last name changing) the whole thing is created. If you would like to have a single relationship from the central to the first name nodes then you need to split it up into two separate parts. Using the following the first MERGE will only create the central-first relationship once.
MERGE (c)-[r:CONTAINS {first:first}]->(o)
MERGE (o)-[rel:CONTAINS {first: first}]->(p)

Loading sparse adjacency matrix on Neo4j

I'm trying to load a sparse (co-occurrence) matrix in Neo4j but after many failed queries, it's getting frustrating.
Raw data
Basically, I want to create the nodes from the ids, and the relationship weight against each other node (including itself) should be the value on the matrix.
So, for example, 'nhs' should have a self-relationship with weight 41 and 16 with 'england', and so on.
I was trying things like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[:w]-(b);
I'm not sure how to attach the edge values though (and not yet sure if the merges are producing the expected result).
Thanks in advance for the assistance
If you just need to add a property on a relationship, where the property value is in your CSV, then it's just a matter of adding a variable for the relationship that you MERGE in, and then using SET (or ON CREATE SET, if you only want to set the property if the relationship didn't exist and needed to be created). So something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[r:w]-(b)
SET r.weight = row.weight
EDIT
Ah, took a look at the CSV clip. This is a very strange way to format your data. You have data in your header (that is, your headers are trying to define the other node to lookup) which is the wrong way to go about this. You should instead have, per row, one column that defines one of the two nodes to connect (like the "id" column) and then another column for the other node (something like an "id2"). That way you can just do two MATCHes to get your nodes, then a MERGE between them, and then setting the relationship property, similar to the sample query I provided above.
But if you're set on this format, then it's going to be a more complicated query, since we have to deal with dynamic access of the row keys and values.
Something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (start:Node {name:row.id})
WITH start, row, [key in keys(row) WHERE key <> 'id'] as keys
FOREACH (key in keys |
MERGE (end:Node {name:key})
MERGE (start)-[r:w]-(end)
ON CREATE SET r.weight = row[key] )
This is a nice Cypher challenge :) Let's say that LOAD CSV is not really meant to do this and probably you would be happier by flattening your data
Here is what I came up with :
LOAD CSV FROM "https://gist.githubusercontent.com/ikwattro/a5260d131f25bcce97c945cb97bc0bee/raw/4ce2b3421ad80ca946329a0be8a6e79ca025f253/data.csv" AS row
WITH collect(row) AS rows
WITH rows, rows[0] AS firstRow
UNWIND rows AS row
WITH firstRow, row SKIP 1
UNWIND range(0, size(row)-2) AS i
RETURN firstRow[i+1], row[0], row[i+1]
You can take a look at the gist

Neo4j - Load CSV and chain relationship

I want to load a csv, a timeline of ordered events to create a list of nodes but I'm having trouble creating a :Next relationship to link two rows.
LOAD CSV WITH HEADERS FROM "file:////events.csv" AS row
merge (:Event{id:row.id})-[:NEXT]-> ??? (:Event {id:row[+1].id)
I suppose one approach is to have a column in the CSV pointing to the next row id.
The following queries assume the nodes already exist. If you also want to create the nodes as necessary, replace MATCH with MERGE.
Option 1:
You can have each row in the CSV file contain a variable number of node ids for the nodes that need to be connected together in a single chain, in order. In this case, the CSV file should not have a header row.
LOAD CSV FROM "file:///events.csv" AS ids
UNWIND [i IN RANGE(1, SIZE(ids)-1) | {a: ids[i-1], b: ids[i]}] AS pair
MATCH (a:Event {id: pair.a})
MATCH (b:Event {id: pair.b})
MERGE (a)-[:NEXT]->(b)
Option 2:
You can have each row in the CSV file contain just a pair of node ids that need to be connected together, in order. In this case, the CSV file could have a header row, as demonstrated by this example (using a and b as the headers).
LOAD CSV WITH HEADERS FROM "file:///events.csv" AS pair
MATCH (a:Event {id: pair.a})
MATCH (b:Event {id: pair.b})
MERGE (a)-[:NEXT]->(b)

Resources