Merge statement in Cypher - neo4j

I came across this statement in a Intro to Cypher video:
Ignoring the last MERGE statement, does the MERGE essentially do an INSERT...ON DUPLICATE KEY ? For example:
MERGE (a:Person {name: "Ann"})
ON CREATE SET a.twitter = "#ann"
Would correspond to:
INSERT INTO Person (name) VALUES ("Ann")
ON DUPLICATE KEY SET twitter = "#ann"
And by extension, if there is a MERGE on a node that doesn't already exist does it act as if it is a CREATE keyword?

Yes, that is what MERGE does. Note that it is not limited to just key fields. It takes into account all fields you provide in the MERGE clause. See also https://neo4j.com/docs/cypher-manual/current/clauses/merge/

Related

Creating property-less nodes in Neo4j

I have a schema like (:A)-[:TYPE_1]-(:B)-[:TYPE_2]-(:A). I need to link [:TYPE_1] and [:TYPE_2] Relationships to certain other Nodes (Say, types C,D,E etc.). I had to create some Nodes without any properties, like (:A)-[:TYPE_1]-(:Action)--(:B)--(:Action)-[:TYPE_2]-(:A). The only purpose of the (:Action) Nodes is to enable me to link the action to some other Nodes (because I can't link a relationship to a Node). Thus, there are no properties associated with them. Since I changed my schema, I am finding that MERGE queries have slowed down incredibly. Obviously, I can't index the (:Action) Nodes, but all other Indexes are in place. What could be going wrong?
Edit:
My logic is that 1) There are multiple csv files 2) Each row in each file provides one (a1:A)-[:TYPE_1]-(type_1:Action)--(b:B)--(type_2:Action)-[:TYPE_2]-(a2:A) pattern. 3) Two different files may provide the same a1,a2 and b entities. 4) However, if the file pertains to a1, it will give qualifiers for type_1 and if the file pertains to a2, it will give qualifiers for type_2. 5) Hence, I do an OPTIONAL MATCH to see if the pattern exists. 6) If it doesn't, I create the pattern, qualifying either type_1, or type_2 based on a parameter in the row called qualifier, which can be type_1 or type_2. 7) If it does, then I just qualify the type_1 or type_2 as the case may be.
statement = """
MERGE (file:File {id:$file})
WITH file
UNWIND $rows as row
MERGE (a1:A {id:row.a1})
ON CREATE
SET a1.name=row.a1_name
MERGE (a2:A {id:row.a2})
ON CREATE
SET a2.name=row.a2_name
MERGE (b:B {id:row.b})
ON CREATE
SET b.name = row.b_name,
MERGE (c:C {id:row.c})
MERGE (d:D {id:row.d})
MERGE (e:E {id:row.e})
MERGE (b)-[:FROM_FILE]->(file)
WITH b,c,d,e,a1,a2,row
OPTIONAL MATCH (a1)-[:TYPE_1]->(type_1:Action)-[:INITIATED]->(b)<-[:INITIATED]-(type_2:Action)<-[:TYPE_2]-(a2)
WITH a1,b,a2,row,c,d,e,type_1,type_2
CALL apoc.do.when(type_1 is null,
"WITH a1,b,a2,row,c,d,e
CALL apoc.do.when(row.qualifier = 'type1',
'CREATE (type_1:Action)
CREATE (type_2:Action)
MERGE (a1)-[:TYPE_1 ]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
MERGE (type_1)-[:WITH_C]->(c)
MERGE (type_1)-[:WITH_D]->(d)
MERGE (type_1)-[:WITH_E]->(e)',
'CREATE (type_1:Action)
CREATE (type_2:Action)
MERGE (a1)-[:TYPE_1]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
MERGE (type_2)-[:WITH_C]->(c)
MERGE (type_2)-[:WITH_D]->(d)
MERGE (type_2)-[:WITH_E]->(e)',
{row:row,a1:a1,a2:a2,b:b,c:c,d:d,e:e})
YIELD value
RETURN value",
"
WITH row,c,d,e,type_1,type_2
CALL apoc.do.when(row.qualifier = 'type1',
'MERGE (type_1)-[:WITH_C]->(c)
MERGE (type_1)-[:WITH_D]->(d)
MERGE (type_1)-[:WITH_E]->(e)',
'MERGE (type_2)-[:WITH_C]->(c)
MERGE (type_2)-[:WITH_D]->(d)
MERGE (type_2)-[:WITH_E]->(e)',
{row:row,type_1:type_1,type_2:type_2,c:c,d:d,e:e})
YIELD value
RETURN value",
{row:row,a1:a1,a2:a2,b:b,c:c,d:d,e:e,type_1:type_1,type_2:type_2})
YIELD value
RETURN count(*) as count
"""
params = []
for row in df.itertuples():
params_dict = {'a1': row[1], 'a1_name': row[-3],'a2':row[2],'a2_name':row[-4],'b_name':row[3],'b':row[-2],'c':int(row[6]),'d':row[7],'e':row[5],'qualifier':row[-1]}
params.append(params_dict)
if row[0] % 5000 == 0:
graph.run(statement, parameters = {"rows" : params,'file':file})
params = []
graph.run(statement, parameters = {"rows" : params,'file':file})
It's hard to say exactly what the issue is but I do notice that you use MERGE a bit more than you actually need to. In your apoc.do.when call you call
MERGE (a1)-[:TYPE_1 ]->(type_1)-[:INITIATED]->(b)<-[:INITIATED]-(type_2)<-[:TYPE_2 ]-(a2)
even though you know that you just created type_1 and type_2 so none of the relationships exist. If you change that to a CREATE you should see a speedup. The same logic applies to the other MERGE calls in that statement.

CREATE works in this statement, but MERGE does't. What's going on here?

I am trying this statement.
CREATE (n:TestEntity), (m1:RelatedEntity)
WITH n,m1
MERGE (m2:RelatedEntity {b:"c"})
WITH n,m1,m2
MERGE (n)-[:REL]->(m1), (n)-[:REL]->(m2)
SET n+={a:1}, m1+={b:"d"}, m2+={d:2}
return n, m1,m2;
This gives an error:
If I change the last MERGE with CREATE, this exact statement works.
If I remove the second relationship and only MERGE the first one, it works. What's going on? Is this a bug?
At this time in Neo4j 4.2.x, MERGE does not support comma-separated patterns, though there is a feature request in the backlog to add that capability.
MERGE does not support comma separation. A workaround is to add the MERGE keyword in front of every node/relation etc.
The below query works:
CREATE (n:TestEntity), (m1:RelatedEntity)
WITH n,m1
MERGE (m2:RelatedEntity {b:"c"})
WITH n,m1,m2
MERGE (n)-[:REL]->(m1) MERGE (n)-[:REL]->(m2)
SET n+={a:1}, m1+={b:"d"}, m2+={d:2}
return n, m1,m2;

How to match line of csv which is ignored by constraint and create only relationship

I have been created a graph having a constraint on primary id. In my csv a primary id is duplicate but the other proprieties are different. Based on the other properties I want to create relationships.
I tried multiple times to change the code but it does not do what I need.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Trial.csv' AS line FIELDTERMINATOR '\t'
MATCH (n:Trial {id: line.primary_id})
with line.cui= cui
MATCH (m:Intervention)
where m.id = cui
MERGE (n)-[:HAS_INTERVENTION]->(m);
I already have the nodes Intervention in the graph as well as the trials. So what I am trying to do is to match a trial with the id from intervention and create only the relationship. Instead is creating me also the nodes.
This is a sample of my data, so the same primary id, having different cuis and I am trying to match on cui:
You can refer the following query which finds Trial and Intervention nodes by primary_id and cui respectively and creates the relationship between them.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Trial.csv' AS line FIELDTERMINATOR '\t'
MATCH (n:Trial {id: line.primary_id}), (m:Intervention {id: line.cui})
MERGE (n)-[:HAS_INTERVENTION]->(m);
The behavior you observed is caused by 2 aspects of the Cypher language:
The WITH clause drops all existing variables except for the ones explicitly specified in the clause. Therefore, since your WITH clause does not specify the n node, n becomes an unbound variable after the clause.
The MERGE clause will create its entire pattern if any part of the pattern does not already exist. Since n is not bound to anything, the MERGE clause would go ahead and create the entire pattern (including the 2 nodes).
So, you could have fixed the issue by simply specifying the n variable in the WITH clause, as in:
WITH n, line.cui= cui
But #Raj's query is even better, avoiding the need for WITH entirely.

How to update a relation property in Neo4J Cypher?

I have the following Neo4J Cypher query:
MATCH (u:User {uid: $userId})
UNWIND $contextNames as contextName
MERGE (context:Context {name:contextName.name,by:u.uid,uid:contextName.uid})
ON CREATE SET context.timestamp=$timestamp
MERGE (context)-[:BY{timestamp:$timestamp}]->(u)
The last string always creates a new relation between the context and the u node. However, what if I just want to update it? How do I integrate this logic into the query above?
Do I have to add WITH context,u before the MERGE and then add rel:BY into the query?
Or do MATCH (context)-[rel:BY.... and then update the rel?
Just looking for the most efficient "best practices" way to do that.
Thanks!
There are two possible situation which might occur:
A relation between context and u is already present
A relation between context and u is not yet present (this will happen when contextwas just created by merge)
When you run the following line
MERGE (context)-[:BY{timestamp:$timestamp}]->(u)
Neo4j will check if there is already a relation BY in place between context and u with the given timestamp value. If yes, no new relation will be created. I guess that the timestamp is not a proper identifier for matching a relation, especially since you write that you want to update it. Therefore, I recommend to update the query in the following way:
MATCH (u:User {uid: $userId})
UNWIND $contextNames as contextName
MERGE (context:Context {name:contextName.name,by:u.uid,uid:contextName.uid})
ON CREATE SET context.timestamp=$timestamp
MERGE (context)-[by:BY]->(u)
SET by.timestamp=$timestamp
This way, a relation will be created if not already present. Either way, the timestamp will be set to the specified value.

Cypher 'Node Already Exists' issue with MERGE

I am preplexed on why I am getting an issue with this Cypher statment when I have a unique constraint on the address of the location node but am using a merge which should find that if it exists and only return the id for the rest of the statment. What am I missing?
Here is my statement:
MERGE(l:Location{location_name:"Starbucks", address:"36350 Van Dyke Ave", city: "Sterling Heights",state: "MI", zip_code:"48312",type:"location",room_number:"",long:-83.028889,lat:42.561152})
CREATE(m:Meetup{meet_date:1455984000,access:"Private",status:"Active",type:"project",did_happen:"",topic:"New features for StudyUup",agenda:"This is a brainstorming session to come with with new ideas for the companion website, StudyUup. Using MatchUup as the base, what should be added, removed, or modified? Bring your thinking caps and ideas!"})
WITH m,l
MATCH (g:Project{title_slug:"studyuup"}) MATCH (p:Person{username:"wkolcz"})
WITH m,l,g,p
MERGE (g)-[:CREATED {rating:0}]->(m)
MERGE (m)-[:MEETUP_AT {rating:0}]->(l)-[:HOSTED_MEETUP]->(m)
MERGE (m)<-[:ATTENDING]-(p)
RETURN id(m) as meeting_id
I am getting:
Node 416 already exists with label Location and property "address"=[36350 Van Dyke Ave]
You've encountered a common misunderstanding of MERGE. MERGE merges on everything you've specified within the single MERGE clause. So the order of operations are:
Search for a :Location node with all of the properties you've specified.
If found, return the node.
If not found, create the node.
Your problem occurs at step 3. Because a node with all of the properties you've specified does not exist, it goes to step 3 and tries to create a node with all of those properties. That's when your uniqueness constraint is violated.
The best practice is to merge on the property that you've constrained to be unique and then use SET to update the other properties. In your case:
MERGE (l:Location {address:"36350 Van Dyke Ave"})
SET l.location_name = "Starbucks",
l.city = "Sterling Heights"
...
The same logic is going to apply for the relationships you're merging later in the query. If the entire pattern doesn't exist, it's going to try to create the entire pattern. That's why you should stick to the best practice of:
MERGE (node1:Label1 {unique_property: "value"})
MERGE (node2:Label2 {unique_property: "value"})
MERGE (node1)-[:REL]-(node2)

Resources