Neo4j: How to speed up a query with multiple merges? - neo4j

I have a query that creates a node, and relates a large number of nodes to it.
Example:
CREATE (target :x {index:'a'})
WITH target
MERGE (x1:x {index:'1'}) MERGE (x1)-[:r]->(target)
MERGE (x2:x {index:'2'}) MERGE (x2)-[:r]->(target)
MERGE (x3:x {index:'3'}) MERGE (x3)-[:r]->(target)
...
MERGE (x1000:x {index:'1000'}) MERGE (x1000)-[:r]->(target)
I have already set an indexes with CREATE CONSTRAINT ON (x:x) ASSERT x.index IS UNIQUE. However, this query is currently taking ~45 minutes to complete.
Is there anything I can do to speed it up? Is adding more CPU power the only option from here?

When you stack MERGE or MATCH statements like that, you can end up with performance issues (related to result rows). For a case like this, use an iterative loop:
CREATE (target :x {index:'a'})
WITH target
FOREACH(i IN RANGE(1, 1000)|
MERGE (a:x {index: toString(i)})
MERGE (a) - [:r] -> (target) )

Related

AND/OR trees queries in Neo4j

I'm representing AND/OR trees in Neo4j as in the figure below (with AND and OR nodes and corresponding relationships). The tree's leaves are properties representing boolean values (green = true, red = false).
Now, I'm looking for an algorithm (or cypher query) to traverse each tree to know whether there is at least a valid path (a set of leaves that satisfies the AND/OR conditions) from the leaves or not to the tree's root. For example, in the figure there is such a path, thus the algorithm should return true (and perhaps also the nodes part of the path).
AND/OR tree in Neo4J
example data:
MERGE (a:ROOT_NODE{key:'a'})
MERGE (b:AND_NODE{key:'b'})
MERGE (c:OR_NODE{key:'c'})
MERGE (d:OR_NODE{key:'d'})
MERGE (e:AND_NODE{key:'e'})
MERGE (f:RED{key:'f'})
MERGE (g:GREEN{key:'g'})
MERGE (h:RED{key:'h'})
MERGE (i:RED{key:'i'})
MERGE (j:GREEN{key:'j'})
MERGE (k:GREEN{key:'k'})
MERGE (k)-[:AND]->(e)
MERGE (j)-[:AND]->(e)
MERGE (e)-[:OR]->(d)
MERGE (i)-[:OR]->(d)
MERGE (f)-[:OR]->(c)
MERGE (g)-[:OR]->(c)
MERGE (h)-[:OR]->(c)
MERGE (c)-[:AND]->(b)
MERGE (d)-[:AND]->(b)
MERGE (b)-[:OR]->(a)

Make all to all relationship among the element present in list in neo4j

https://community.neo4j.com/t/make-all-to-all-relationship-among-the-element-present-in-list/37390
here is the link of same question
I have csv file with two column as source and target.
I need to make all to all relationship between all the element in the target column of each respective source.
here is my query, I am not able to make all to all relationship.
LOAD CSV with headers FROM 'file:///sample.csv' AS row
unwind (split(row.target,'|')) as target
merge (n:MyNode{name:row.source})
merge (m:MyNode{name:target})
merge (m) -[:TO{weight:row.weight}]->(n)
merge (m) -[:r]-(m) // not sure about this line
Unfortunately the Eager operations in the plan will complicate this, making you unable to use USING PERIODIC COMMIT LOAD CSV (which you would need for processing any large CSV).
(more info on Eager behavior here)
In Neo4j 4.1 or 4.2 you could use subqueries to get around this, but that will not hold for 4.3 and above.
With subqueries:
LOAD CSV with headers FROM 'file:///sample.csv' AS row
MERGE (n:MyNode{name:row.source})
WITH row, n
CALL {
WITH row, n
UNWIND (split(row.target,'|')) as target
MERGE (m:MyNode{name:target})
MERGE (m) -[:TO{weight:row.weight}]->(n)
WITH collect(m) as targets
UNWIND targets as t1
UNWIND targets as t2
WITH t1, t2
WHERE id(t1) < id(t2)
MERGE (t1)-[:r]-(t2)
RETURN true as result
}
RETURN result
For versions 4.0 and below, subqueries are not available, and with versions 4.3.x (not yet released) and above, subqueries no longer workaround Eager operators, so this won't work.
Instead, you could use apoc.cypher.doIt() in place of the subquery, which will work around the Eager (but you'll have to work with Cypher query string), or instead you can do 3 passes through your CSV:
First pass to MERGE the source node
Second pass to only split() and MERGE the target nodes.
Third pass to MATCH to both source and targets and MERGE the relationships between them.
You could also achieve that after the import, rather than during by doing something like this
match (s:Source)-[:TO]-(t:Target)
with s, collect (t) as targets
unwind targets as target
foreach (n in targets | merge (n)-[:r]-(target))
LOAD CSV with headers FROM 'file:///sample.csv' AS row
MERGE (n:MyNode:domain{name:row.source})
WITH row, n
CALL {
WITH row, n
UNWIND (split(row.target,'|')) as target
MERGE (m:MyNode:token{name:target})
MERGE (m) -[:TO{weight:row.weight}]->(n)
WITH collect(m) as targets
UNWIND targets as t1
UNWIND targets as t2
WITH t1, t2
WHERE id(t1) < id(t2)
MERGE (t1)-[:token_join]-(t2)
RETURN true as result
}
WITH n
CALL{
WITH n
match (d1:domain)
match (d2:domain)
WITH d1,d2
WHERE id(d1) < id(d2)
MERGE (d1)-[:domain_join]-(d2)
return true as result
}
match (nodes) return nodes
did little modification in above code....

Create node on condition

I'm trying to make a relationship between nodes in Neo4j if a certain condition is met like. Currently I have node(a) and node(b):
What I want
if node(b) is in label1 then make relation: node(a)-[:r]-node(b:label1)
else merge node(b) in label2 then make relation node(a)-[:r]-node(b:label2)
What I have
match (a:label1 {id:"t1"})
merge (b:label1 {id:"t6"})
on create
set b:label2 remove b:label1
merge (a)-[:Friends_with]-(b)
Unfortunately, Cypher doesn't really have an if-then-else syntax (ON MATCH and ON CREATE is the closest you will get). I would recommend running running multiple cyphers and executing follow-ups based on the return result.
So like
MATCH (a:label1 {id:"t1"})
MATCH (b:label1 {id:"t6"})
MERGE (a)-[:Friends_with]-(b)
return COUNT(b) as b1Exists
And if that returns 0, then execute
MATCH (a:label1 {id:"t1"})
MERGE (b:label2 {id:"t6"})
MERGE (a)-[:Friends_with]-(b)
Depending on your data, you might get away with this
MATCH (a:label1 {id:"t1"})
MERGE (b {id:"t6"})
ON CREATE SET b:label2
MERGE (a)-[:Friends_with]-(b)
But just know that doing this in 1 Cypher will probably result in bugs (In this case, if there are more valid labels than 1 and 2. You will want a UUID for this method)

Give a specific pattern from a node during clause

I have the nodes: (a:charlie), (b:economy), and (c:bicycle) . I want to create this pattern:
create (a:charlie)-[x:wants_make]->(b:economy)->[y:by_using]->(c:bicycle)
But it gives me cartesian product. I already thought to skip the creation of the node (b) giving to relation [x:want_make]a property. But node (b) has many other relations in the same context(economic context). What I want to get the pattern above.
Any suggestion?
If your query looks like this:
MATCH (a:charlie), (b:economy), (c:bicycle)
MERGE (a)-[:wants_make]->(b)-[:by_using]->(c);
then it is saying both of these things:
Create a wants_make relationship between every charlie node and every economy node.
Create a by_using relationship between every economy node and every bicycle node.
So, if the number of charlie, economy, and bicycle nodes are C, E, and B -- this results in (C * E * B) merges, which is a Cartesian product of a Cartesian product.
Also, your data model seems to be wrong. For example, it seems much more reasonable to have a Person label instead of a charlie label.
A more reasonable query might look something like this:
MERGE (a:Person {name: 'Charlie Brown'})
MERGE (c:Bicycle {id: 123})
MERGE (a)-[:wants_make]->(b:Economy)
MERGE (b)-[:by_using]->(c);
This query avoids Cartesian products by being more specific about the first and last nodes in the path, and it also avoids creating nodes and relationships that already exist.
And, going even further, you might want to combine wants_make, Economy, and by_using into a single economizes_by_using relationship:
MERGE (a:Person {name: 'Charlie Brown'})
MERGE (c:Bicycle {id: 123})
MERGE (a)-[:economizes_by_using]->(c);
You might need to break up your query a bit:
MATCH (a:charlie), (b:economy), (c:bicycle)
MERGE (a)-[:wants_make]->(b), (b)->[:by_using]->(c)

Is it better to make one MERGE request for multiple nodes / edges creation in Neo4J Cypher 2.0 or to split it into transactions?

I have a long Cypher query (the new Neo4J 2.0 version), which creates multiple nodes and connections using the MERGE command.
The question is: do you think I'm better off splitting it into different parts and submitting it as a transaction (for robustness) or should I keep the long single one (for speed)?
Here's the query:
MATCH (u:User {name: "User"}) MERGE (tag1:Hashtag {name:"tag1"}) MERGE (tag2:Hashtag
{name:"tag2"}) MERGE (tag3:Hashtag {name:"tag3"}) MERGE (tag4:Hashtag {name:"tag4"})
MERGE tag1-[:BY]->u MERGE tag2-[:BY]->u MERGE tag3-[:BY]->u MERGE tag4-[:BY]->u;
(I purposefully made the request shorter, imagine that there are like 50 tags (nodes) and even more edges for example)
As long as your query statement is not hundreds of lines and your data created doesn't exceed 50k elements, I'd stick with one query.
But you should use parameters instead.
I would also rewrite your query with foreach and parameters
MATCH (u:User {name: {userName})
FOREACH (tagName in {tags} |
MERGE (tag:Hashtag {name:tagName})
MERGE (tag)-[:BY]->(u)
)
params:
{userName:"User", tags: ["tag1",...,"tagN"]}

Resources