I have a csv file with below columns and sample data provided, and I've loaded into Neo4j and got stuck when I was trying to create relationships.
**source destination miles**
a b 5
a c 6
a d 20
Now I want to create a graph with source in middle and connected destinations around and label with miles between two stops.(A star graph with source in middle), so I tried below queries, it's not returning miles on the label, I'm new to Neo4j, any help is appreciated, thanks in advance.
LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS line
CREATE (s:src{id: line.source})
CREATE (d:dst{id: line.destination})
CREATE (s)-[r:trips {total: [line.miles]}]->(d)
RETURN s, d, r;
By default, LOAD CSV expects the CSV file to use comma separators, and it does not support extraneous whitespace. Try changing the content of your CSV file to this:
source,destination,miles
a,b,5
a,c,6
a,d,20
Also, you should use MERGE instead of CREATE to avoid creating duplicate nodes. And there is no evident need to store the miles value in an array, so this query stores it as a scalar value:
LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS line
MERGE (s:src {id: line.source})
MERGE (d:dst {id: line.destination})
CREATE (s)-[r:trips {miles: line.miles}]->(d)
RETURN s, d, r;
The result of the above is:
╒══════════╤══════════╤══════════════╕
│"s" │"d" │"r" │
╞══════════╪══════════╪══════════════╡
│{"id":"a"}│{"id":"b"}│{"miles":"5"} │
├──────────┼──────────┼──────────────┤
│{"id":"a"}│{"id":"c"}│{"miles":"6"} │
├──────────┼──────────┼──────────────┤
│{"id":"a"}│{"id":"d"}│{"miles":"20"}│
└──────────┴──────────┴──────────────┘
Related
So For example take this data which is stored in csv file:
source,child
A,B
B,C
C,D
X,Y
Y,Z
And i load it like this:
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS line
MERGE (s:src {id: line.source})
MERGE (d:dst {id: line.child})
CREATE (s)-[:FEEDs_INTO]->(d)
In my example we have 2 leaves - A and X, but there may be multiple leaves for one node. Now I want to get every leaf-node connection. So for my example I want something like this:
A,B
A,C
A,D
X,Y
X,Z
How Can i do it ?
Your data model can be improved further. Since src and dst are connected, then you can label them in one class (let say, label it as "node"). Thus your loading script can be as follows:
LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS line
MERGE (s:node {id: line.source})
MERGE (d:node {id: line.child})
MERGE (s)-[:FEEDs_INTO]->(d)
Then your query will be as simple as below:
MATCH (child:node)-[:FEEDs_INTO*]->(parent:node)
WHERE NOT EXISTS((:node)-->(child))
RETURN child, parent
where the * in the relationship means the path may vary from 1 to a length of maximum x. This is like you want to jump from A to B (length: 1) to C (length: 2) then to D (length: 3) and so on. The where clause makes sure that the child is a leaf without a node attached to it.
Result:
╒══════════╤══════════╕
│"child" │"parent" │
╞══════════╪══════════╡
│{"id":"A"}│{"id":"B"}│
├──────────┼──────────┤
│{"id":"A"}│{"id":"C"}│
├──────────┼──────────┤
│{"id":"A"}│{"id":"D"}│
├──────────┼──────────┤
│{"id":"X"}│{"id":"Y"}│
├──────────┼──────────┤
│{"id":"X"}│{"id":"Z"}│
└──────────┴──────────┘
I'm trying to load a sparse (co-occurrence) matrix in Neo4j but after many failed queries, it's getting frustrating.
Raw data
Basically, I want to create the nodes from the ids, and the relationship weight against each other node (including itself) should be the value on the matrix.
So, for example, 'nhs' should have a self-relationship with weight 41 and 16 with 'england', and so on.
I was trying things like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[:w]-(b);
I'm not sure how to attach the edge values though (and not yet sure if the merges are producing the expected result).
Thanks in advance for the assistance
If you just need to add a property on a relationship, where the property value is in your CSV, then it's just a matter of adding a variable for the relationship that you MERGE in, and then using SET (or ON CREATE SET, if you only want to set the property if the relationship didn't exist and needed to be created). So something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (a: node{name: row.id})
MERGE (b: node{name: row.key})
MERGE (a)-[r:w]-(b)
SET r.weight = row.weight
EDIT
Ah, took a look at the CSV clip. This is a very strange way to format your data. You have data in your header (that is, your headers are trying to define the other node to lookup) which is the wrong way to go about this. You should instead have, per row, one column that defines one of the two nodes to connect (like the "id" column) and then another column for the other node (something like an "id2"). That way you can just do two MATCHes to get your nodes, then a MERGE between them, and then setting the relationship property, similar to the sample query I provided above.
But if you're set on this format, then it's going to be a more complicated query, since we have to deal with dynamic access of the row keys and values.
Something like:
LOAD CSV WITH HEADERS FROM 'file:///matpharma.csv' AS row
MERGE (start:Node {name:row.id})
WITH start, row, [key in keys(row) WHERE key <> 'id'] as keys
FOREACH (key in keys |
MERGE (end:Node {name:key})
MERGE (start)-[r:w]-(end)
ON CREATE SET r.weight = row[key] )
This is a nice Cypher challenge :) Let's say that LOAD CSV is not really meant to do this and probably you would be happier by flattening your data
Here is what I came up with :
LOAD CSV FROM "https://gist.githubusercontent.com/ikwattro/a5260d131f25bcce97c945cb97bc0bee/raw/4ce2b3421ad80ca946329a0be8a6e79ca025f253/data.csv" AS row
WITH collect(row) AS rows
WITH rows, rows[0] AS firstRow
UNWIND rows AS row
WITH firstRow, row SKIP 1
UNWIND range(0, size(row)-2) AS i
RETURN firstRow[i+1], row[0], row[i+1]
You can take a look at the gist
I want to load a csv, a timeline of ordered events to create a list of nodes but I'm having trouble creating a :Next relationship to link two rows.
LOAD CSV WITH HEADERS FROM "file:////events.csv" AS row
merge (:Event{id:row.id})-[:NEXT]-> ??? (:Event {id:row[+1].id)
I suppose one approach is to have a column in the CSV pointing to the next row id.
The following queries assume the nodes already exist. If you also want to create the nodes as necessary, replace MATCH with MERGE.
Option 1:
You can have each row in the CSV file contain a variable number of node ids for the nodes that need to be connected together in a single chain, in order. In this case, the CSV file should not have a header row.
LOAD CSV FROM "file:///events.csv" AS ids
UNWIND [i IN RANGE(1, SIZE(ids)-1) | {a: ids[i-1], b: ids[i]}] AS pair
MATCH (a:Event {id: pair.a})
MATCH (b:Event {id: pair.b})
MERGE (a)-[:NEXT]->(b)
Option 2:
You can have each row in the CSV file contain just a pair of node ids that need to be connected together, in order. In this case, the CSV file could have a header row, as demonstrated by this example (using a and b as the headers).
LOAD CSV WITH HEADERS FROM "file:///events.csv" AS pair
MATCH (a:Event {id: pair.a})
MATCH (b:Event {id: pair.b})
MERGE (a)-[:NEXT]->(b)
I am trying to get a csv into Neo4j. As it consists of log entries, I'd like to connect nodes with a NEXT-pointer/relationship when the corresponding logs have been created at subsequent times.
LOAD CSV WITH HEADERS FROM 'http://localhost/Export.csv' AS line
CREATE (:Entry { date: line[0], ...})
MATCH (n)
RETURN n
ORDER BY n:date
MATCH (a:Entry),(b:Entry),(c:Entry)
WITH p AS min(b:date)
WHERE a:date < b:date AND c.date = p
CREATE (a)-[r:NEXT]->(c)
The last four lines do not work however. What I try is to get the earliest entry 'c' out of the group of entries 'b' with a larger timestamp than 'a'. Can anyone help me out here?
Not sure if I understood your question correctly: you have a csv file containing log records with a timestamp. Each line contains one record. You want to interconnect the events to form a linked list based on a timestamp?
In this case I'd split up the process into two steps:
using LOAD CSV create a node with a data property for each line
afterwards connect the entries using e.g. a cypher statement like this:
.
MATCH (e:Entry)
WITH e ORDER BY e.date DESC
WITH collect(e) as entries
FOREACH(i in RANGE(0, length(entries)-2) |
FOREACH(e1 in [entries[i]] |
FOREACH(e2 in [entries[i+1]] |
MERGE (e1)-[:NEXT]->(e2))))
Say I have a csv file containing node information, each line with a unique id (the first column), and another csv file containing the edges, describing edges between the nodes (via their unique ID's). The following cypher code successfully loads the nodes and then creates the edges. However, can I make it more efficient? My real data set has millions of nodes and tens of millions of edges. Obviously I should use periodic commits and create an index, but can I somehow avoid matching for every single edge and use the fact that I know of the unique node ids for each edge I want to build? Or am I going about this all wrong? I would like to do this entirely in cypher (no java).
load csv from 'file:///home/user/nodes.txt' as line
create (:foo { id: toInt(line[0]), name: line[1], someprop: line[2]});
load csv from 'file:///home/user/edges.txt' as line
match (n1:foo { id: toInt(line[0])} )
with n1, line
match (n2:foo { id: toInt(line[1])} )
// if I had an index I'd use it here with: using index n2:foo(name)
merge (n1) -[:bar]-> (n2) ;
match p = (n)-->(m) return p;
nodes.txt:
0,node0,Some Property 0
1,node1,Some Property 1
2,node2,Some Property 2
3,node3,Some Property 3
4,node4,Some Property 4
5,node5,Some Property 5
6,node6,Some Property 6
7,node7,Some Property 7
8,node8,Some Property 8
9,node9,Some Property 9
10,node10,Some Property 10
...
edges.txt:
0,2
0,4
0,8
0,13
1,4
1,8
1,15
2,4
2,6
3,4
3,7
3,8
3,11
4,10
...
Like Ron commented above, LOAD CSV is likely not the way to go for large datasets, and the csv Batch Import tool he links to is great. If you find you cannot wedge a csv easily in a way that works with the Batch Import tool, then the Neo4J BatchInserter API is very simply to use:
http://docs.neo4j.org/chunked/stable/batchinsert.html