Neo4j: Improve cypher performance - neo4j

I have the following cypher query where I have to UNWIND around 100 data. But my problem is, to run query, it takes so much time to execute (about 3-4 mins).
My Query:
CREATE (pl:List {id: {id}, title: {title} })
WITH pl as pl
MATCH (b:Units {id: {bId} })
MERGE (b)-[rpl:UNITS_LIST]->(pl)
WITH pl as pl
UNWIND {Ids} as Id
MATCH (p:Infos {id: Id})
WITH p as cn, pl as pl
SET cn :Data
WITH pl as pl, cn as cn
MERGE (pl)-[cnpt:DATA_LIST { email: cn.email } ]->(cn)
RETURN pl
Sample Data
List:
{
id: 'some-unique-id',
name: "some-name''
}
Ids ( Ids should be around 100 ):
[ 'some-info-id-01','some-info-id-03' ]
Infos (Neo4j DB):
[
{ id: 'some-info-id-01', others: 'some-others-data-01' },
{ id: 'some-info-id-02', others: 'some-others-data-02' },
{ id: 'some-info-id-03', others: 'some-others-data-03' }
]
Any suggestion to improve this cypher query ??
PS, I'm running this CQL in my node.js app.

This query looks like it should be pretty fast if you have proper indexes in place.
You should have these in place:
CREATE INDEX ON :Infos(id);
CREATE INDEX ON :Units(id);
CREATE INDEX ON :List(id);
With those indexes, the query should be fast because mostly you're looking up nodes by those IDs and then doing very small things on top of that. Even with 100 IDs that's not that hard of a query.
The counterpoint is that if you don't have your ID fields indexed, neo4j will have to look through most/all of them to figure out which items to match. The more data you have, the slower this query will get.
If you have these things indexed and you're still seeing very slow performance, you need to EXPLAIN the query and post the plan for further feedback.

Related

Apoc.merge.relationship() creates duplicates in Neo4j

I am trying to create relationship between two nodes using apoc.merge.relationship, but it creates two same relationships, which I can see by search. They both have same direction and everything is the same although from query it's obvious that newLink.id is identifier. I hope someone can show me what is wrong with my cypher query.
UNWIND [{
color:'#82abd1', direction:'true', id:'q', index:0, linkType:'a',
source:'46166.a690c888-e3d5-41ed-8469-79a88cce8388', status:'approved',
target:'46163.a690c888-e3d5-41ed-8469-79a88cce8388', type:'Applies for', value:2
}] AS newLink
MATCH
(fNode:Node {id: newLink.source}),
(sNode:Node {id: newLink.target})
CALL apoc.merge.relationship(
fNode,
'Label',
{id: newLink.id},
apoc.map.clean(newLink, ['id','type'],[]),
sNode,
apoc.map.clean(newLink, ['id','type'],[])
)
YIELD rel
RETURN DISTINCT 'true';
my search query is
MATCH ()-[rel]-() RETURN COUNT(rel)
My query was finding same relationships for both (node1)-[rel]-(node2) and for (node2)-[rel]-(node1). So one way to avoid this situation is using ID(node1)>ID(node2), which compares nodes id given by neo4j.

Checking every relationship of a path triggers way too many dbhits

I am running a query to find the path from a user to a permission node in neo4j and I want to make sure it doesn't traverse some relationships. Therefore I've used the following cypher :
PROFILE
MATCH path = (u:user { _id: 'ea6b17e0-3b9e-11ea-b206-e7610aa23593' })-[r:accessRole|isMemberOf*1..5]->(n:PermissionSet { name: 'project'})
WHERE all(x IN r WHERE NOT (:PermissionSet)-[x]->(:user))
RETURN path
I didn't expect the where clause to trigger so many hits. I believe I'm not writing my test correctly.
(2000 nodes/ 3500 rels => 350,000 hits for "(:PermissionSet)-[x]->(:user)"
any advice?
Not sure this is the correct answer, but I added a WITH statement
PROFILE
MATCH path = (u:user { _id: 'ea6b17e0-3b9e-11ea-b206-e7610aa23593' })-[r:accessRole|isMemberOf*1..5]->(n:PermissionSet { name: 'project'})
WITH path,r
WHERE all(x IN r WHERE NOT (:PermissionSet)-[x]->(:user))
RETURN path
And the dbhits for "(:PermissionSet)-[x]->(:user)" went down to 2800 hits.
I can guess why it does that, but I'd love some more experts explanations, and is there a better way to do it? (this way is fine with me performance-wise)

Trying to batch merge nodes that are the same

I have about 4.7 million "entity nodes." Many of these are duplicate entities. I want to merge the entities that are the same and keep the relationship(s) between those new combined entities and the things they are connected to in place. I wrote the below query to try and do this, but it does not seem to be working. Any assistance with this is greatly appreciated.
CALL apoc.periodic.iterate(
'MATCH (e:Entity)
WITH e.name AS name, e.entity_type AS type, collect(e) as nodes
CALL apoc.refactor.mergeNodes(nodes, {
properties: {
author_id:"combine",
author_name:"combine",
entity_hash:"combine",
entity_type:"combine",
forum_id:"combine",
name:"discard",
post_id:"combine",
thread_id:"combine"
}
}) YIELD node
RETURN count(node) AS new_node_count',
'',
{batchSize:100000}
)
The pinwheel keeps spinning but not reduction in nodes or anything, which tells me it's hung.
You don't use correctly the procedure apoc.periodic.iterate. This procedure takes 2 queries :
the first : for creating a population of element on which you will iterate
the second : for each element of the first query, what you want to do
So in your cae, the query should be :
CALL apoc.periodic.iterate(
'MATCH (e:Entity)
WITH e.name AS name, e.entity_type AS type, collect(e) as nodes
RETURN nodes',
'CALL apoc.refactor.mergeNodes(nodes, {
properties: {
author_id:"combine",
author_name:"combine",
entity_hash:"combine",
entity_type:"combine",
forum_id:"combine",
name:"discard",
post_id:"combine",
thread_id:"combine"
}
})',
{batchSize:500}
)
Moreover I have decrease the size of the batch to 500, because if you have a lot of identicals nodes, 500 is cool (or 1000 but not 100000 otherwise you will have some OOM).
To see the perf of this query, you can previously test the first query to see if it is fast.

creating 200K relationships to a node is taking a lot of time in Neo4J 3.5?

I have one vertex like this
Vertex1
{
name:'hello',
id: '2',
key: '12345',
col1: 'value1',
col2: 'value2',
.......
}
Vertex2, Vertex3, ..... Vertex200K
{
name:'hello',
id: '1',
key: '12345',
col1: 'value1',
col2: 'value2',
.......
}
Cypher Query
MATCH (a:Dense1) where a.id <> "1"
WITH a
MATCH (b:Dense1) where b.id = "1"
WITH a,b
WHERE a.key = b.key
MERGE (a)-[:PARENT_OF]->(b)
The end result should be Vertex1 should have a degree of 200K, therefore, there should be 200K relationships. However, the above query takes a lot of time pretty much killing the throughput to 500/second. Any ideas on how to create relationships/edges quicker?
When I run the profile and the cypher query above it keeps running forever and doesn't return so I reduced the size from 200K to 20K and here is what the profile is showing me.
Given your memory constraints, and the high db hits associated with your MERGE of the relationships, the issue is likely that you're trying to MERGE 200k relationships in a single transaction. You should probably batch this by using apoc.periodic.iterate() from APOC Procedures:
CALL apoc.periodic.iterate("
MATCH (a:Dense1)
WHERE a.id <> '1'
MATCH (b:Dense1)
WHERE b.id = '1' AND a.key = b.key
RETURN a, b",
"MERGE (a)-[:PARENT_OF]->(b)",
{}) YIELD batches, total, errorMessages
RETURN batches, total, errorMessages
This should batch those merges 10k at a time.
Also, if you happen to know for a fact that those relationships don't yet exist, use CREATE instead of MERGE, it will be faster.
Create an Index on the properties you are using for matching.
Here id and key properties.
You can create an index with the following queries:
CREATE INDEX ON :Schema1(id);
CREATE INDEX ON :Schema1(key);
This is the first step to improve performance.
You can further improve with a few other tricks.
Can you try running
MATCH (b:Dense1) where b.id <> "1"
WITH b, b.key AS bKey
MATCH (a:Dense1) where a.id = "1" AND a.key = bKey
MERGE (a)-[:PARENT_OF]->(b)
after ensuring you have indexes on id and key ?
Also, do I get this right that id is NOT unique, and you have 1 node with id=2 and 200k with id = 1? If I got this wrong flip the condition to make the first line return single node, one you want to have all relations coming into, and second part matching all the remaining 200k nodes. Also, in the merge, put the low-density node as the first one (so here, b would get 200k relationships in) - if that's not right, reverse it to be (b) <-[:XXX]-(a).
It's been a while since I was dealing with large imports/merges, but I recall that extracting the variable explicitly (e.g. bKey) that can then be matched in index, and starting from single nodes (single, or just a few b's) moving onto higher (multiple a's) was working better than queries with where clauses similar to a.key = b.key.
Having said that, 200k relationships in one transaction, AND connected to single node, is a lot, since just matching on the index finds the nodes quickly, but still you need to validate all outgoing relationships to see if by chance they already link to the other node. So, by the time you create your last relationship, you need to iterate/check nearly 200k relationships.
One trick is running batches in a loop until nothing gets created, e.g.
MATCH (b:Dense1) where b.id = "1"
WITH b, b.key AS bKey
MATCH (a:Dense1) where a.id <> "1" AND a.key = bKey
AND NOT (a) -[:PARENT_OF]-> (b) WITH a,b LIMIT 10000
MERGE (a)-[:PARENT_OF]->(b)
This might show you probably that the further the batch, the longer it takes - makes sense logically, as more and more relationships out of b need to be checked the further you go.
Or, as shown in other responses, batch via APOC.
Last thing - is this supposed to be ongoing process or one-time setup / initialisation of the DB? There are more, dramatically faster options if it's for initial load only.

Limiting ShortestPath in Cypher to nodes with specific properties

I am trying to figure out how to limit a shortest path query in cypher so that it only connects "Person" nodes containing a specific property.
Here is my query:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {id: 2})) RETURN p
I would like to limit it so that when it connects from one Person node to another Person node, the Person node has to contain a property called "job" and a value of "engineer."
Could you help me construct the query? Thanks!
Your requirements are not very clear, but if you simply want one of the people to have an id of 1 and the other person to be an engineer, you would use this:
MATCH p = shortestPath( (from:Person {id: 1})-[*]-(to:Person {job: "engineer"}))
RETURN p;
This kind query should be much faster if you also created indexes for the id and job properties of Person.

Resources