I have a Cypher query:
PROFILE MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Profile )
WITH childD
RETURN count(childD)
Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED. 20003 total db hits in 14 ms
and the second query:
PROFILE MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Profile)
MATCH (childD)-[:CONTAINS]->(childDStat:JobableStatistic)
WITH childD
RETURN count(childD)
Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED. 224367 total db hits in 68 ms.
as you may see DB hits incresses from 20003 total db hits to 224367.. But I have one_2_one relationship between childD and childDStat and 10k childD and 10K childDStat for them. What am I doing wrong in my query and how to decrease DB hits?
Using multiple relationships types can help you optimize your queries, especially if you are only counting relationships and not doing anything else. What i've seen in practice is having really specific relationships like:
(dg:DecisionGroup {id: -2})-[:DECISIONGROUP_HAS_PROFILE]->(childD:Profile )
So something like that. Then you can quickly count relationships by utilizing the relationship count store:
PROFILE MATCH (dg:DecisionGroup {id: -2})
WITH dg, size((dg)-[DECISIONGROUP_HAS_PROFILE]->()) AS c
RETURN sum(c) AS result
Take a look at: https://neo4j.com/developer/kb/fast-counts-using-the-count-store/
It seems they have added a few more Cypher options to access the count store, but anyway, count store is much more performant than expanding each relationship.
You can get creative with more "complex" queries and rewrite the
PROFILE MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Profile)
MATCH (childD)-[:CONTAINS]->(childDStat:JobableStatistic)
WITH childD
RETURN count(childD)
into
PROFILE MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Profile)
WITH childD, size((childD)-[:CONTAINS]->()) AS count
RETURN sum(count) AS result
Notice that you are not checking the label of the node at the end of the relationship, so your model must ensure that is always correct.
Related
I have been experimenting pattern comprehensions for optimization, but seems getting even more confused
Here is my initial query:
MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE 2000 <= m.year <= 2005 AND a.born.year >= 1980
RETURN a.name AS Actor, a.born AS Born,
collect(DISTINCT m.title) AS Movies ORDER BY Actor
from profiling, I am getting:
Cypher version: , planner: COST, runtime: PIPELINED. 41944 total db hits in 152 ms.
I attempted the following rewrite:
profile MATCH (a:Actor)
WHERE a.born.year >= 1980
// Add a WITH clause to create the list using pattern comprehension
with a
match (a)-[:ACTED_IN]-(m:Movie)
where 2000 <= m.year <= 2005
// filter the result of the pattern comprehension to return only lists with elements
// return the Actor, Born, and Movies
return a.name as Actor, a.born as Born, [(a)-[:ACTED_IN]-(m) | m.title] as Movies
order by a
from profiling, I am getting:
Cypher version: , planner: COST, runtime: PIPELINED. 47879 total db hits in 47 ms.
Then I try another rewrite:
profile MATCH (a:Actor)
WHERE a.born.year >= 1980
// Add a WITH clause to create the list using pattern comprehension
// filter the result of the pattern comprehension to return only lists with elements
// return the Actor, Born, and Movies
with a, [ (a)-[:ACTED_IN]-(m:Movie) where 2000 <= m.year <= 2005 | m.title] as Movies
return a.name as Actor, a.born as Born, Movies
order by a
Cypher version: , planner: COST, runtime: PIPELINED. 59251 total db hits in 6 ms.
Each performance is worse than another. While I can review the query plan to understand the differences. Is there a way to use pattern comprehension to actually reduce my DB hits comparing to the initial query using collect statement?
Please show us the profile result on your last query; I tested it in Movie database and it worked well vs the orig query(46ms vs orig: 120db hits). Also, check if Actor.born.year has an index.
profile MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2000 <= m.released <= 2005 AND a.born >= 1980
RETURN a.name AS Actor, a.born AS Born,
collect(DISTINCT m.title) AS Movies ORDER BY Actor
planner: COST, runtime: PIPELINED. 120 total db hits in 9 ms
profile MATCH (a:Person)
WHERE a.born >= 1980
RETURN a.name AS Actor, a.born AS Born,
[(a)-[:ACTED_IN]-(m:Movie) where 2000 <= m.released <= 2005 | m.title] AS Movies ORDER BY Actor
planner: COST, runtime: PIPELINED. 43 total db hits in 6 ms
match (s:Subscriber {field_9 : 'female'})-[:BELONGS_TO]->(:SubscriberList {id: 4})
return count(s).
This is a simple query with one relation and 2 filters. because Subscriber is big up to 7m nodes I want to use field_9 as a first NodeIndexSeek, because SubscriberList label only contains 15 nodes. currently, profile looks like
if I change the query and force to use indexs results will be:
match (s:Subscriber {field_9 : 'female'})-[:BELONGS_TO]->(:SubscriberList {id: 4})
using index s:Subscriber(field_9)
return count(s)
More optimised query could be done with a subquery:
match (s:Subscriber { field_9: 'female' })-[:BELONGS_TO]-(sl:SubscriberList)
with count(s) as ss, sl.id as slId where slId = 4
return ss, slId
but my goal is to use the first query and try to force the planner to use field_9 as a first NodeIndexSeek. Is there any idea how to achieve this?
first query:
match (s:Subscriber {field_9 : 'female'})-[:BELONGS_TO]->(:SubscriberList {id: 4})
return count(s)
This might work:
MATCH (s:Subscriber {field_9: 'female'})-[:BELONGS_TO]->(sl:SubscriberList {id: 4})
USING INDEX s:Subscriber(field_9)
USING SCAN sl:SubscriberList
RETURN COUNT(s)
That USING SCAN hint tells Cypher to use scanning to find the desired sl nodes rather than the associated index. However, since the s node is found via indexing, hopefully the planner will be smart enough to generate a plan that follows the BELONGS_TO relationship to find the related SubscriberLists (instead of actually scanning all SubscriberList nodes).
You should profile this query and compare the total number of DB hits with your other queries to find the one that works best.
I have a following network result when I run this query in neo4j browser:
MATCH (n1:Item {name: 'A'})-[r]-(n2:Item) Return n1,r,n2
At the bottom of the graph, it says: Displaying 6 nodes, 7 relationships.
But when I look on the table in the neo4j browser, I only have 5 records
n1,r,n2
A,A->B,B
A,A->C,C
A,A->D,D
A,A->E,E
A,A->F,F
So in the java code, when I get the list of records using the code below:
List<Record> records = session.run(query).list();
I only get 5 records, so I only get the 5 relationships.
But I want to get all 7 relationships including the 2 below:
B->C
C->F
How can i achieve that using the cypher query?
This should work:
MATCH (n:Item {name: 'A'})-[r1]-(n2:Item)
WITH n, COLLECT(r1) AS rs, COLLECT(n2) as others
UNWIND others AS n2
OPTIONAL MATCH (n2)-[r2]-(x)
WHERE x IN others
RETURN n, others, rs + COLLECT(r2) AS rs
Unlike #FrantišekHartman's first approach, this query uses UNWIND to bind n2 (which is not specified in the WITH clause and therefore becomes unbound) to the same n2 nodes found in the MATCH clause. This query also combines all the relationships into a single rs list.
There are many ways to achieve this. One ways is to travers to 2nd level and check that the 2nd level node is in the first level as well
MATCH (n1:Item {name: 'A'})-[r]-(n2:Item)
WITH n1,collect(r) AS firstRels,collect(n2) AS firstNodes
OPTIONAL MATCH (n2)-[r2]-(n3:Item)
WHERE n3 IN firstNodes
RETURN n1,firstRels,firstNodes,collect(r2) as secondRels
Or you could do a Cartesian product between the first level nodes and match:
MATCH (n1:Item {name: 'A'})-[r]-(n2:Item)
WITH n1,collect(r) AS firstRels,collect(n2) as firstNodes
UNWIND firstNodes AS x
UNWIND firstNodes AS y
OPTIONAL MATCH (x)-[r2]-(y)
RETURN n1,firstRels,firstNodes,collect(r2) as secondRels
Depending on on cardinality of firstNodes and secondRels and other existing relationships one might be faster than the other.
I have the following query:
CALL apoc.index.relationships('TO','user:37f0ce60-b428-11e8-bb45-9394d4f42b57') YIELD rel, start, end
WITH DISTINCT rel, start, end
MATCH (ctx:Context)
WHERE rel.context = ctx.uid AND (ctx.name="iG9CE55wbtY" )
RETURN DISTINCT start.uid AS source_id, start.name AS source_name, end.uid AS target_id, end.name AS target_name, rel.uid AS edge_id, ctx.name AS context_name, rel.statement AS statement_id, rel.weight AS weight;
Which uses indexed relationships. However, it takes about 4 to 10 seconds to process.
Here's the results with PROFILE:
Cypher version: CYPHER 3.3, planner: COST, runtime: INTERPRETED. 470705 total db hits in 2758 ms.
Is there anything I could optimize in this query, for instance, using parameters or rewriting it in any way that could improve the performance?
I currently have this query:
START n=node(*)
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN p, count(p)
LIMIT 5
Say there are 42 people in FooManGroup, I want to return 5 of these people, with a count of 42.
Is this possible to do in one query?
Running this now returns 5 rows, which is fine, but a count of 104, which is the total number of nodes of any type in my DB.
Any suggestions?
You can use a WITH clause to do the counting of the persons, followed by an identical MATCH clause to do the matching of each person. Notice that you need to START on the p nodes and not just some n that will match any node in the graph:
MATCH (p:Person )-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
WITH count(p) as personsInGroup
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN p, personsInGroup
LIMIT 5
It may not be the best or most elegant way to this, but it works. If you use cypher 2.0 it may be a bit more compact like this:
MATCH (p:Person)-[:is_member]->(g:Group {name: 'FooManGroup'})
WITH count(p) as personsInGroup
MATCH (p:Person)-[:is_member]->(g:Group {name: 'FooManGroup'})
RETURN p, personsInGroup
LIMIT 5
Relationship types are always uppercased in cypher, so :is_member should be :IS_MEMBER which I think is more readable:
MATCH (p:Person)-[:IS_MEMBER]->(g:Group {name: 'FooManGroup'})
WITH count(p) as personsInGroup
MATCH (p:Person)-[:IS_MEMBER]->(g:Group {name: 'FooManGroup'})
RETURN p, personsInGroup
LIMIT 5
Try this:
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN count(p), collect(p)[0..5]