I can't figure out how to create links out of CSV tables in Neo4j. I've read several parts of the manual (match, loadCSV, etc), that free book, and several tutorials I've found. None of them seems to contemplate my use case (which is weird, because I think it's a pretty simple use case). I've tried adapting the code they have in all sorts of ways, but nothing seems to work.
So, I have three CSV tables: parent companies, child companies, and parent-child pairs. I begin by loading the first two tables (and that works fine - all the properties are there, all the info is correct):
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/children.csv" AS node
CREATE (:Children {id: node[0], name: node[1]})
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/parents.csv" AS node
CREATE (:Parent {id: node[0], name: node[1]})
Now, here's the structure of the third table:
child_id,parent_id
Here's some of the things I've tried:
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/link.csv" AS rels
MATCH (FROM {Parent: rels[1]}), (TO {Children: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
This doesn't give me an eror, but it returns zero rows.
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/link.csv" AS rels
MATCH (FROM {id: rels[1]}), (TO {id: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
This doesn't give me an error, but it just returns a bunch of pairs of empty nodes. So, it creates the links, but somehow it doesn't link the actual nodes.
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/link.csv" AS rels
MATCH (FROM {Parent.id: rels[1]}), (TO {Children.id: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
This gives me a syntax error (Neo.ClientError.Statement.InvalidSyntax)
I also tried several variations of the code blocks above, but to no avail. So, what am I doing wrong? (I'm on Neo4j 2.1.6, in case that matters.)
In your cypher statement, you are not referencing to the same identifiers used in the MATCH for creating the relationship, so he will just create new empty nodes :
Look at the difference :
MATCH (FROM {id: rels[1]}), (TO {id: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
Instead it should be :
LOAD CSV FROM "file:/C:/Users/thiago.marzagao/Desktop/CSVs/link.csv" AS rels
MATCH (Parent {id: rels[1]}), (Children {id: rels[0]})
CREATE (Parent)-[:OWNS]->(Children)
Related
I am trying to create a relationship between two existing nodes. I am reading the node ID's from a CSV and creating the relationship with the following query:
LOAD CSV WITH HEADERS FROM "file:///8245.csv" AS f
MATCH (Ev:Event) where id(Ev) =f.first
MATCH (Ev_sec:Event) where id(Ev_sec) = f.second
WITH Ev, Ev_sec
MERGE (Ev) - [:DF_mat] - > (Ev_sec)
However, it is not changing anything the database. How can I solve this problem?
Thanks!
I solved the problem. So, I again queried for the ID(node) and this time I exported them as a string (by using toString(ID(node)) ). Then while loading to the database, I converted them to Integer. The query is as follows:
LOAD CSV WITH HEADERS FROM "file:///8245_new.csv" AS csvLine
match (ev:Event) where id(ev)=toInteger(csvLine.first)
match (ev_sec:Event) where id(ev_sec)=toInteger(csvLine.second)
merge (ev)-[:DF_mat]-> (ev_sec)
Apologies as I am new to neo4j and struggling with what I imagine is a very simple example.
I would like to model an org chart which I have stored as a csv like so
id,name,manager_id
1,allan,2
2,bob,4
3,john,2
4,sam,
5,Jim,2
Note that Bob has 3 direct reports and Bob reports into Sam who doesn't report into anyone.
I would like to produce a graph which shows the management chain. I have tried the following, but it produces relationships which are disjoint from the people:
LOAD CSV WITH HEADERS FROM "file///employees.csv" AS csvLine
CREATE (p:Person {id: csvLine.id, name: csvLine.name})
CREATE (p)-[:MANAGED_BY {manager: csvLine.manager_id}]->(p)
This query creates a bunch of self-referencing relationships. Is there anyway to populate the graph with one command over the single csv? I must be missing something and any help is appreciated. Thanks
I think this is what you are looking for.
In your query tou are creating a relationship between p and p thus the self referencing relationships.
I added a coalesce statement to deal with people that do not have a manager_id value. THis way Sam can report to himself.
LOAD CSV WITH HEADERS FROM "file:///employees.csv" AS csvLine
// create or match the person in the left column
MERGE (p:Person {id: csvLine.id })
// if they are created then assign their name
ON CREATE SET p.name = csvLine.name
// create or match the person/manager in the right column
MERGE (p1:Person {id: coalesce(csvLine.manager_id, csvLine.id) })
// create the reporting relationship
CREATE (p)-[:MANAGED_BY]->(p1)
I have been created a graph having a constraint on primary id. In my csv a primary id is duplicate but the other proprieties are different. Based on the other properties I want to create relationships.
I tried multiple times to change the code but it does not do what I need.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Trial.csv' AS line FIELDTERMINATOR '\t'
MATCH (n:Trial {id: line.primary_id})
with line.cui= cui
MATCH (m:Intervention)
where m.id = cui
MERGE (n)-[:HAS_INTERVENTION]->(m);
I already have the nodes Intervention in the graph as well as the trials. So what I am trying to do is to match a trial with the id from intervention and create only the relationship. Instead is creating me also the nodes.
This is a sample of my data, so the same primary id, having different cuis and I am trying to match on cui:
You can refer the following query which finds Trial and Intervention nodes by primary_id and cui respectively and creates the relationship between them.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Trial.csv' AS line FIELDTERMINATOR '\t'
MATCH (n:Trial {id: line.primary_id}), (m:Intervention {id: line.cui})
MERGE (n)-[:HAS_INTERVENTION]->(m);
The behavior you observed is caused by 2 aspects of the Cypher language:
The WITH clause drops all existing variables except for the ones explicitly specified in the clause. Therefore, since your WITH clause does not specify the n node, n becomes an unbound variable after the clause.
The MERGE clause will create its entire pattern if any part of the pattern does not already exist. Since n is not bound to anything, the MERGE clause would go ahead and create the entire pattern (including the 2 nodes).
So, you could have fixed the issue by simply specifying the n variable in the WITH clause, as in:
WITH n, line.cui= cui
But #Raj's query is even better, avoiding the need for WITH entirely.
I can load CSV into Neo4j for a specific label (say PERSON) and the nodes are created under the label PERSON.
I also have another CSV to illustrate the relationships between the person and it looks like:
name1, relation, name2
a, LOVE, b
a, HATE, c
I want to create a relationship between these pairs and the relationship thus created should be "LOVE", "HATE", etc, instead of a rigid RELATION as done by the below script:
load csv with headers from "file:///d:/Resources/Neo4j/person-rel.csv" as p
match (a:PERSON) where a.name=p.name1
match (b:PERSON) where b.name=p.name2
merge (a)-[r:REL {relation: p.REL}]->(b)
By doing this, I have a bunch of REL-type relations but not LOVE- and HATE-relations.
In another word, I want the REL in the last line of the script to be dynamically assigned. And then I can query out all the relationship types using Neo4j API.
Is this possible?
You can install the APOC library and then use apoc.merge.relationship
apoc.merge.relationship(startNode, relType, {key:value, ...}, {key:value, ...}, endNode) - merge relationship with dynamic type
load csv with headers from "file:///d:/Resources/Neo4j/person-rel.csv" as p
match (a:PERSON) where a.name=p.name1
match (b:PERSON) where b.name=p.name2
call apoc.merge.relationship(a,p.REL,{},{},b) yield rel
return count(*);
I had an XML file which I wanted to visualize in Neo4j (as a graph with interconnected nodes). The XML file had the following hierarchy:
<Organism>
<Enzyme>
<Motif>
I was successful in creating the entire graph. When I finished I realized that a lot of times different organisms had a common enzyme or 2 different enzymes had common motifs. Now there is a lot of redundancy in my graph with similar enzymes or motifs occuring multiple times. Is there an easier way to remove all except 1 of the node (either an enzyme or motif) and then connect it to different nodes? Or will i have to start from scratch?
My CREATE statements looked like this:
CREATE (jejunistrain81176:Organism { name: "Campylobacter jejuni strain 81-176" })
CREATE (jejunistrain81176_e1:Enzyme { name: "CjeFIII" })
CREATE (jejunistrain81176_m1:Motif { name: "GCAAGG" })
CREATE UNIQUE (jejunistrain81176)-[:HAS_ENZYME]->(jejunistrain81176_e1)
CREATE UNIQUE (jejunistrain81176_e1)-[:HAS_MOTIF]->(jejunistrain81176_m1)
I tried replacing all the CREATE with MERGE but it gives me the following error :
Invalid input '(': expected whitespace, comment, '=', node labels, MapLiteral, a parameter, a relationship pattern, ON, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, RETURN, UNION, ';' or end of input (line 21, column 14)
Replacing create and create unique with merge should actually work.
You'd have to create an index or constraint for your :Label(name) pairs.
I think you're faster reimporting the data, if you want to delete and reconnect nodes you need to know which relationships you're looking at.
Something like:
MATCH (e:Enzyme)
WITH e.name as name, count(*) as cnt, collect(e) as enzymes
where cnt > 1
WITH enzymes[0] as first, enzymes[1..] as remove
UNWIND remove as enzyme
MATCH (enzyme)<-[rel:HAS_ENZYME]-(organism)
MERGE (first)<-[newRel:HAS_ENZYME]-(organism) ON CREATE SET newRel = rel
DELETE rel
WITH distinct first, enzyme
MATCH (enzyme)-[rel:HAS_MOTIV]->(motiv)
MERGE (first)-[newRel:HAS_MOTIV]->(motiv) ON CREATE SET newRel = rel
DELETE rel
DELETE enzyme;