I have two csv files with nodes and edges.
nodes:
big, adjective
arm, noun
face, noun,
best, adjective
edges:
big, face
best, friend
face, arm
I want to create graph with relationships by edges and add nodes group: noun and adjective.
I use this command to create relationships:
LOAD CSV FROM 'file:copperfield_edges.csv' AS line MERGE (g:G {word1 : line[0]}) WITH line, g MERGE (j:J {word2 : line[1]}) WITH g,j MERGE (g)-[:From_To]->(j);
but in this case each word appears two times. How can I do only unique relationships of words and add noun and adjective group?
I want to get something like this http://joxi.ru/1A5QX6MH6LZ1AE
You're assignen the G label to all nodes in the first column and the J label to all in the second column. Since you have one identifier (e.g. big, face) for every word, use one label for all, e.g. Word
Try the following:
LOAD CSV FROM 'file:copperfield_edges.csv' AS line
MERGE (g:Word {word : line[0]})
MERGE (j:Word {word : line[1]})
MERGE (g)-[:From_To]->(j);
Based on your nodes csv file, you can assign an additional label indicating if the word is an adjective or noun:
LOAD CSV FROM 'file:nodes.csv' AS line
MERGE (w:Word {word: line[0]})
FOREACH (n in (CASE WHEN line[1] = "adjective" THEN [1] ELSE [] END) |
set w :Adjective )
FOREACH (n in (CASE WHEN line[1] = "nound" THEN [1] ELSE [] END) |
set w :Noun )
Since you cannot set labels dynamically, I've had to use the FOREACH trick documented at http://www.markhneedham.com/blog/2014/06/17/neo4j-load-csv-handling-conditionals/
If your graph is more than a handful of nodes consider using creating an index before running LOAD CSV:
CREATE INDEX ON :Word(word)
Related
I have the following graph stored in csv format:
graphUnioned.csv:
a b
b c
The above graph denotes path from Node:a to Node:b. Note that the first column in the file denotes source and the second column denotes destination. With this logic the second path in the graph is from Node:b to Node:c. And the longest path in the graph is: Node:a to Node:b to Node:c.
I loaded the above csv in Neo4j desktop using the following command:
LOAD CSV WITH HEADERS FROM "file:\\graphUnioned.csv" AS csvLine
MERGE (s:s {s:csvLine.s})
MERGE (o:o {o:csvLine.o})
MERGE (s)-[]->(o)
RETURN *;
And then for finding longest path I run the following command:
match (n:s)
where (n:s)-[]->()
match p = (n:s)-[*1..]->(m:o)
return p, length(p) as L
order by L desc
limit 1;
However unfortunately this command only gives me path from Node: a to Node:b and does not return the longest path. Can someone please help me understand as to where am I going wrong?
There are two mistakes in your CSV import query.
First, you need to use a type when you MERGE a relationship between nodes, that query won't compile otherwise. You likely supplied one and forgot to add it when you pasted it here.
Second, the big one, is that your query is merging nodes with different labels and different properties, and this is majorly throwing it off. Your intent was to create 3 nodes, with a longest path connecting them, but your query creates 4 nodes, two isolated groups of two nodes each:
This creates 2 b nodes: (:s {s:b}) and (:o {o:b}). Each of them is connected to a different node, and this is due to treating the nodes to be created from each variable in the CSV differently.
What you should be doing is using the same label and property key for all of the nodes involved, and this will allow the match to the b node to only refer to a single node and not create two:
LOAD CSV WITH HEADERS FROM "file:\\graphUnioned.csv" AS csvLine
MERGE (s:Node {value:csvLine.s})
MERGE (o:Node {value:csvLine.o})
MERGE (s)-[:REL]->(o)
RETURN *;
You'll also want an index on :Node(value) (or whatever your equivalent is when you import real data) so that your MERGEs and subsequent MATCHes are fast when performing lookups of the nodes by property.
Now, to get to your longest path query.
If you are assuming that the start node has no relations to it, and that your end node has no relationships from it, then you can use a query like this:
match (start:Node)
where not ()-->(start)
match p = (start)-[*]->(end)
where not (end)-->()
return p, length(p) as L
order by L desc
limit 1;
in Neo4J I am trying to visualize a small amount of calls taken from a csv file ( fake numbers sample below):
A,B
1,4
1,5
1,2
2,7
2,9
2,11
3,15
I am dealing with each column (A,B) as phone numbers would be the nodes and the presence of a call between them ( A to B ) is the relationship
ideally the graph produced should show multiple relationships between
the nodes (e.g.:node with value 1 would have three connections to other nodes and one of these is node 2 with value 2 that has another three connections,finally node with value 3 would have a connection but be separate)
the code i am trying
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:blahblahblah.csv" AS row
CREATE (A:phone {number: row.A})
CREATE (B:phone {number: row.B})
WITH A as a MATCH (a)-[:CALLED*]-(m)
RETURN a,m
and obviously its producing repeated nodes and only single relationships with no 2nd level arrows..
what am I doing wrong here?
This should be most efficient:
create constraint on (p:phone) assert p.number is unique;
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:blahblahblah.csv" AS row
with distinct row.A as value
MERGE (:phone {number: value});
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:blahblahblah.csv" AS row
with distinct row.B as value
MERGE (:phone {number: value});
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:blahblahblah.csv" AS row
MATCH (A:phone {number: row.A})
MATCH (B:phone {number: row.B})
MERGE (A)-[r:CALLED]->(B)
ON CREATE SET r.count = 1
ON MATCH SET r.count = r.count + 1;
Not really sure what you want to query though?
E.g.
MATCH (a:phone)-[r:CALLED]->(b)
RETURN a, sum(r.count) as calls
ORDER BY calls DESC LIMIT 10;
I am trying to get a csv into Neo4j. As it consists of log entries, I'd like to connect nodes with a NEXT-pointer/relationship when the corresponding logs have been created at subsequent times.
LOAD CSV WITH HEADERS FROM 'http://localhost/Export.csv' AS line
CREATE (:Entry { date: line[0], ...})
MATCH (n)
RETURN n
ORDER BY n:date
MATCH (a:Entry),(b:Entry),(c:Entry)
WITH p AS min(b:date)
WHERE a:date < b:date AND c.date = p
CREATE (a)-[r:NEXT]->(c)
The last four lines do not work however. What I try is to get the earliest entry 'c' out of the group of entries 'b' with a larger timestamp than 'a'. Can anyone help me out here?
Not sure if I understood your question correctly: you have a csv file containing log records with a timestamp. Each line contains one record. You want to interconnect the events to form a linked list based on a timestamp?
In this case I'd split up the process into two steps:
using LOAD CSV create a node with a data property for each line
afterwards connect the entries using e.g. a cypher statement like this:
.
MATCH (e:Entry)
WITH e ORDER BY e.date DESC
WITH collect(e) as entries
FOREACH(i in RANGE(0, length(entries)-2) |
FOREACH(e1 in [entries[i]] |
FOREACH(e2 in [entries[i+1]] |
MERGE (e1)-[:NEXT]->(e2))))
Hello I was trying to import some data in csv file to neo4j in my ubuntu 12.04.
The csv file is a two column data file with no header its format is like:
12ffew3213,232rwe13
12ffew3213,5yur2ru2r
rwerwerw3,432rwe13
rwerwerw3,5yur2ru2r
the thing is the data in row 0 and row 1 is not unique, for example the data may be 3000 lines and only has 100 unique row0 value and 300 unique row1 value.
And I want to build a graph with unique 100 row0 nodes and 300 row1 nodes and 3000 relationships between those nodes(if 12ffew3213,232rwe13 appears twice there are 2 edges).
Since I am new to neo4j and Cypher. After I tried with CREATE and MERGE for a while I still cannot build UNIQUE nodes. I used something like
LOAD CSV FROM 'file:///home/nate/Downloads/file.csv' AS line
MERGE (:A { number: toString(line[0])})-[:LIKES]->(:B { ID: toString(line[1])})
Any ideas??Thanks ahead!
Here's what you do.
LOAD CSV FROM 'file:///home/nate/Downloads/file.csv' AS line
MERGE (n:A {number : line[0]})
WITH line, n
MERGE (m:B {ID : line[1]})
WITH m,n
MERGE (n)-[:LIKES]->(m);
You first create or match the :A node, then create or match the :B node, then create or match the relationship. The WITH clauses collect the results at each point in the sequence to use in the next. To find out more about WITH clauses, read Section 9.5 in the Neo4j Manual.
The same for csv with header. If our header is 'head1','head2' our code will be:
LOAD CSV WITH HEADERS FROM 'file:///home/nate/Downloads/file.csv' AS line
MERGE (n:A {number : line.head1})
WITH line, n
MERGE (m:B {ID : line.head2})
WITH m,n
MERGE (n)-[:LIKES]->(m);
Say I have a csv file containing node information, each line with a unique id (the first column), and another csv file containing the edges, describing edges between the nodes (via their unique ID's). The following cypher code successfully loads the nodes and then creates the edges. However, can I make it more efficient? My real data set has millions of nodes and tens of millions of edges. Obviously I should use periodic commits and create an index, but can I somehow avoid matching for every single edge and use the fact that I know of the unique node ids for each edge I want to build? Or am I going about this all wrong? I would like to do this entirely in cypher (no java).
load csv from 'file:///home/user/nodes.txt' as line
create (:foo { id: toInt(line[0]), name: line[1], someprop: line[2]});
load csv from 'file:///home/user/edges.txt' as line
match (n1:foo { id: toInt(line[0])} )
with n1, line
match (n2:foo { id: toInt(line[1])} )
// if I had an index I'd use it here with: using index n2:foo(name)
merge (n1) -[:bar]-> (n2) ;
match p = (n)-->(m) return p;
nodes.txt:
0,node0,Some Property 0
1,node1,Some Property 1
2,node2,Some Property 2
3,node3,Some Property 3
4,node4,Some Property 4
5,node5,Some Property 5
6,node6,Some Property 6
7,node7,Some Property 7
8,node8,Some Property 8
9,node9,Some Property 9
10,node10,Some Property 10
...
edges.txt:
0,2
0,4
0,8
0,13
1,4
1,8
1,15
2,4
2,6
3,4
3,7
3,8
3,11
4,10
...
Like Ron commented above, LOAD CSV is likely not the way to go for large datasets, and the csv Batch Import tool he links to is great. If you find you cannot wedge a csv easily in a way that works with the Batch Import tool, then the Neo4J BatchInserter API is very simply to use:
http://docs.neo4j.org/chunked/stable/batchinsert.html