I had an XML file which I wanted to visualize in Neo4j (as a graph with interconnected nodes). The XML file had the following hierarchy:
<Organism>
<Enzyme>
<Motif>
I was successful in creating the entire graph. When I finished I realized that a lot of times different organisms had a common enzyme or 2 different enzymes had common motifs. Now there is a lot of redundancy in my graph with similar enzymes or motifs occuring multiple times. Is there an easier way to remove all except 1 of the node (either an enzyme or motif) and then connect it to different nodes? Or will i have to start from scratch?
My CREATE statements looked like this:
CREATE (jejunistrain81176:Organism { name: "Campylobacter jejuni strain 81-176" })
CREATE (jejunistrain81176_e1:Enzyme { name: "CjeFIII" })
CREATE (jejunistrain81176_m1:Motif { name: "GCAAGG" })
CREATE UNIQUE (jejunistrain81176)-[:HAS_ENZYME]->(jejunistrain81176_e1)
CREATE UNIQUE (jejunistrain81176_e1)-[:HAS_MOTIF]->(jejunistrain81176_m1)
I tried replacing all the CREATE with MERGE but it gives me the following error :
Invalid input '(': expected whitespace, comment, '=', node labels, MapLiteral, a parameter, a relationship pattern, ON, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, RETURN, UNION, ';' or end of input (line 21, column 14)
Replacing create and create unique with merge should actually work.
You'd have to create an index or constraint for your :Label(name) pairs.
I think you're faster reimporting the data, if you want to delete and reconnect nodes you need to know which relationships you're looking at.
Something like:
MATCH (e:Enzyme)
WITH e.name as name, count(*) as cnt, collect(e) as enzymes
where cnt > 1
WITH enzymes[0] as first, enzymes[1..] as remove
UNWIND remove as enzyme
MATCH (enzyme)<-[rel:HAS_ENZYME]-(organism)
MERGE (first)<-[newRel:HAS_ENZYME]-(organism) ON CREATE SET newRel = rel
DELETE rel
WITH distinct first, enzyme
MATCH (enzyme)-[rel:HAS_MOTIV]->(motiv)
MERGE (first)-[newRel:HAS_MOTIV]->(motiv) ON CREATE SET newRel = rel
DELETE rel
DELETE enzyme;
Related
I have been created a graph having a constraint on primary id. In my csv a primary id is duplicate but the other proprieties are different. Based on the other properties I want to create relationships.
I tried multiple times to change the code but it does not do what I need.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Trial.csv' AS line FIELDTERMINATOR '\t'
MATCH (n:Trial {id: line.primary_id})
with line.cui= cui
MATCH (m:Intervention)
where m.id = cui
MERGE (n)-[:HAS_INTERVENTION]->(m);
I already have the nodes Intervention in the graph as well as the trials. So what I am trying to do is to match a trial with the id from intervention and create only the relationship. Instead is creating me also the nodes.
This is a sample of my data, so the same primary id, having different cuis and I am trying to match on cui:
You can refer the following query which finds Trial and Intervention nodes by primary_id and cui respectively and creates the relationship between them.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Trial.csv' AS line FIELDTERMINATOR '\t'
MATCH (n:Trial {id: line.primary_id}), (m:Intervention {id: line.cui})
MERGE (n)-[:HAS_INTERVENTION]->(m);
The behavior you observed is caused by 2 aspects of the Cypher language:
The WITH clause drops all existing variables except for the ones explicitly specified in the clause. Therefore, since your WITH clause does not specify the n node, n becomes an unbound variable after the clause.
The MERGE clause will create its entire pattern if any part of the pattern does not already exist. Since n is not bound to anything, the MERGE clause would go ahead and create the entire pattern (including the 2 nodes).
So, you could have fixed the issue by simply specifying the n variable in the WITH clause, as in:
WITH n, line.cui= cui
But #Raj's query is even better, avoiding the need for WITH entirely.
I would like to get a node, delete all outgoing relationships of a certain type and then add back relationships.
The problem I have is that once I grab the node, it still maintains it's previous relationships even after delete so instead of having 1 it keeps doubling whatever it has. 1->2->4->8 etc
Sample graph:
CREATE (a:Basic {name:'a'})
CREATE (b:Basic {name:'b'})
CREATE (c:Basic {name:'c'})
CREATE (a)-[:TO]->(b)
CREATE (a)-[:SO]->(c)
The query to delete the previous relationships and then add in the new relationships. (this is just a brief sample where in reality it wouldn't add back the same relationships, but more then likely point it to a different node).
MATCH (a:Basic {name:'a'})
WITH a
OPTIONAL MATCH (a)-[r:TO|SO]->()
DELETE r
WITH a
MATCH (b:Basic {name:'b'})
CREATE (a)-[:TO]->(b)
WITH a
MATCH (c:Basic {name:'c'})
CREATE (a)-[:SO]->(c)
If I change the CREATE to MERGE then it solves the problem, but it feels odd to have to merge when I know that I just deleted all the relationships. Is there a way to update "a" midway through the query so it reflects the changes? I would like to keep it in one query
The behavior you observed is due the subtle fact that the OPTIONAL MATCH clause generated 2 rows of data, which caused all subsequent operations to be done twice.
To force there to be only a single row of data after the DELETE clause, you can use WITH DISTINCT a (instead of WITH a) right after the DELETE clause, like this:
MATCH (a:Basic {name:'a'})
OPTIONAL MATCH (a)-[r:TO|SO]->()
DELETE r
WITH DISTINCT a
MATCH (b:Basic {name:'b'})
CREATE (a)-[:TO]->(b)
WITH a
MATCH (c:Basic {name:'c'})
CREATE (a)-[:SO]->(c)
I have the following file A.csv
"NODE","PREDECESSORS"
"1",""
"2","1"
"3","1;2"
I want to create with the nodes: 1,2,3 and its relationships 1->2->3 and 1->3
I have already tried to do so:
LOAD CSV WITH HEADERS FROM 'file:///A.csv' AS line
CREATE (:Task { NODE: line.NODE, PREDECESSORS: SPLIT(line.PREDECESSORS ';')})
FOREACH (value IN line.PREDECESSORS |
MERGE (PREDECESSORS:value)-[r:RELATIONSHIP]->(NODE) )
But it does not work, that is, it does not create any relationship.
Please, might you help me?
The problem is in your MERGE:
MERGE (PREDECESSORS:value)-[r:RELATIONSHIP]->(NODE)
This is merging a :value labeled node and assigning it to the variable PREDECESSORS, which can't be what you want to do.
A better approach would be not save the predecessor data in the node, just use that to match on the relevant nodes and create the relationships.
It will also help to have an index on :Task(NODE) so your matches to the predecessors are quick.
Remember also that cypher queries do not process the entire query for each row, but rather each operation in the query is processed for each row, so once the CREATE executes, all nodes will be created, there's no need to use MERGE the predecessor nodes.
Try something like this:
LOAD CSV WITH HEADERS FROM 'file:///A.csv' AS line
CREATE (node:Task { NODE: line.NODE})
WITH node, SPLIT(line.PREDECESSORS, ';') as predecessors
MATCH (p:Task)
WHERE p.NODE in predecessors
MERGE (p)-[:RELATIONSHIP]->(node)
I have a set of CSV files with duplicate data, i.e. the same row might (and does) appear in multiple files. Each row is uniquely identified by one of the columns (id) and has quite a few other columns that indicate properties, as well as required relationships (i.e. ids of other nodes to link to). The files all have the same format.
My problem is that, due to size and number of the files, I want to avoid processing the rows that already exist - I know that as long as id is the same, the contents of the rows will be the same across the files.
Can any cypher wizard advise how to write a query that would create the node, set all the properties and create all the relationship if a node with given id does not exist, but skip the action altogether if such node is found? I tried with MERGE ON CREATE, something along the lines of:
LOAD CSV WITH HEADERS FROM "..." AS row
MERGE (f:MyLabel {id:row.uniqueId})
ON CREATE SET f....
WITH f,row
MATCH (otherNode:OtherLabel {id : row.otherNodeId})
MERGE (f) -[:REL1] -> (otherNode)
but unfortunately that can only be applied to not setting the properties again, but I couldn't work out how to skip the merging part of relationships (only shown one here, but there are quite a few more).
Thanks in advance!
You can just optionally match the node and then skip with WHERE n IS NULL
Make sure you have an index or constraint on :MyLabel(id)
LOAD CSV WITH HEADERS FROM "..." AS row
OPTIONAL MATCH (f:MyLabel {id:row.uniqueId})
WHERE f IS NULL
MERGE (f:MyLabel {id:row.uniqueId})
ON CREATE SET f....
WITH f,row
MATCH (otherNode:OtherLabel {id : row.otherNodeId})
MERGE (f) -[:REL1] -> (otherNode)
I'm modeling a "tag cloud" with the graph:
(t:Tag {name:'cypher'})-[:IN]->(g:TagGroup)<-[:TAGGED]-(x)
IE: A named tag is part of a "TagGroup", to which zero or more nodes are "TAGGED". I chose this design as I want the ability to combine two or more named tags (e.g. "cypher" and "neo4j") so that both (Tag)s are [IN]the new (TagGroup) and the new (TagGroup) is the endpoint for the union of all nodes that were previously [TAGGED].
My only (not very pleasing) attempt is:
match (t:Tag {name:'cypher'})-[i:IN]->(g:TagGroup),
(t2:Tag {name:'neo4j'})-[:IN]->(g2:TagGroup)<-[y:TAGGED]-(x)
create (t2)-[:IN]->(g)
create unique (g)<-[:TAGGED]-(x)
with g2 as g2
match (g2)<-[r]->() delete g2,r
My main issues is that it only combines two nodes, and doesn't feel very efficient (although I have no alternatives to compare it with). Ideally I'd be able to combine an arbitrary set of (Tag)s by name.
Any ideas if this can be done with Cypher, and if so, how?
You can use labels instead of creating separate tag groups.
eg. if tag neo4j and cypher come under tag group say XYZ then
MERGE (a:Tag {name: "neo4j"})-[:TAGGED]->(x)
MERGE (b:Tag {name: "cypher"})-[:TAGGED]->(x)
set a :XYZ , b :XYZ
So next time you want tags of a particular group TAGGED to a particular post x
MATCH (a:Tag:XYZ)-[:TAGGED]->(x) return a.name