Forcing cost planner to start from a specific index seek - neo4j

My cypher query
EXPLAIN MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
produces the following execution plan
--------------------------------------------+
| Operator | Estimated Rows | Identifiers | Other |
+-------------------+----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults | 12 | count(tx) | |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +EagerAggregation | 12 | count(tx) | |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter | 136 | anon[16], b, tx | AndedPropertyInequalities(Variable(b),Property(Variable(b),PropertyKeyName(time)),GreaterThanOrEqual(Property(Variable(b),PropertyKeyName(time)),Parameter( AUTOINT2,Integer)), LessThan(Property(Variable(b),PropertyKeyName(time)),Parameter( AUTOINT1,Integer))) |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Expand(All) | 9052 | anon[16], b, tx | (tx)-[anon[16]:INCLUDED_IN]->(b) |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +NodeIndexSeek | 9052 | tx | :Transaction(pstype) |
+-------------------+----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
which executes way too slow because the first NodeIndexSeekByRange returns tens of millions of nodes instead of 9052. Using NodeIndexSeekByRange on b:Block(time) would produce around 600 nodes.
I have tried forcing the execution plan to start from b:Block(time), but instead it still keeps using NodeIndexSeek on tx:Transaction(pstype):
EXPLAIN MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
USING INDEX b:Block(time)
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
produces
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
| Operator | Estimated Rows | Identifiers | Other |
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
| +ProduceResults | 12 | count(tx) | |
| | +----------------+-----------------+--------------------------------------------------------------+
| +EagerAggregation | 12 | count(tx) | |
| | +----------------+-----------------+--------------------------------------------------------------+
| +NodeHashJoin | 136 | anon[16], b, tx | b |
| |\ +----------------+-----------------+--------------------------------------------------------------+
| | +NodeIndexSeekByRange | 14703 | b | :Block(time) >= { AUTOINT2} AND :Block(time) < { AUTOINT1} |
| | +----------------+-----------------+--------------------------------------------------------------+
| +Expand(All) | 9052 | anon[16], b, tx | (tx)-[anon[16]:INCLUDED_IN]->(b) |
| | +----------------+-----------------+--------------------------------------------------------------+
| +NodeIndexSeek | 9052 | tx | :Transaction(pstype) |
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
The only way I have gotten it to work fast is by using the rule planner: (multiple orders of magnitude faster)
CYPHER planner=rule MATCH (b:Block)
WHERE 1540512000 <= b.time < 1540598400
WITH b
MATCH (b)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
RETURN count(tx);
Is there a way to make it work when using the cost planner?
Both :Block(time) and :Transaction(pstype) are indexed.

You could try using a join hint on tx along with your index hint, which should ensure you only expand from one direction:
EXPLAIN
MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
USING INDEX b:Block(time)
USING JOIN ON tx
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
Alternately you could restructure your query a bit so the tx node isn't initially part of the pattern, but enforced in the WHERE clause. You'll need to split the MATCH in 2, but I don't think you'll need any planner hints:
EXPLAIN
MATCH (tx:Transaction {pstype: 0})
MATCH (b:Block)<-[:INCLUDED_IN]-(x)
WHERE 1540512000 <= b.time < 1540598400
AND x = tx
RETURN count(tx);
EDIT
Okay, let's try another approach then:
EXPLAIN
MATCH (b:Block)<-[:INCLUDED_IN]-(x)
WHERE 1540512000 <= b.time < 1540598400
AND x.pstype = 0 // AND 'Transaction' in labels(x)
RETURN count(tx);
If we leave off the label then it can't use an indexed lookup. If there are other nodes besides :Transaction nodes that have a pstype property, you could try uncommenting the line where we use an alternate way to see if the node has that label (I don't think this will use an index lookup, but not completely sure).
Another alternative (unsure if this will work) is to use pattern comprehension to get a list of results from a pattern (after the initial match is found to b) and summing the sizes of the results:
EXPLAIN
MATCH (b:Block)
WHERE 1540512000 <= b.time < 1540598400
RETURN sum(size([(b)<-[:INCLUDED_IN]-(x:Transaction) WHERE x.pstype = 0 | x])) as count

Related

How to match node labels using OR?

match (p:Product {id:'5116003'})-[r]->(o:Attributes|ExtraAttribute) return p, o
How to match two possible node labels in such a query?
Per cybersam's suggestion, I changed to the follwoing:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE o:Attributes OR o:ExtraAttributes
**WHERE any(key in keys(o) WHERE toLower(key) contains 'weight')**
return o
Now I need to add the 2nd 'where' clause. How to modify that?
You can try using any() function:
match (p:Product {id:'5116003'})-[r]->(o)
where any (label in labels(o) where label in ['Attributes', 'ExtraAttribute'])
return p, o
Also, if you have APOC procedures, you can use apoc.path.expand path expander procedure that expands from start node following the given relationships from min to max-level adhering to the label filters.
match (p:Product {id:'5116003'})
call apoc.path.expand(p, null,"+Attributes|ExtraAttribute",0,1) yield path
with nodes(path) as nodes
// return p and o nodes
return nodes[0], nodes[1]
See more here.
These two single-label forms of your query:
MATCH (p:Product {id:'5116003'})-->(o:Attributes) RETURN p, o;
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes RETURN p, o;
produce the same execution plan, as follows (I assume that there is an index on :Product(id)):
+-----------------+----------------+------+---------+------------------+--------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+--------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+--------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | o:Attributes |
| | +----------------+------+---------+------------------+--------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+--------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+--------------+
This two-label form of the second query above:
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes OR o: ExtraAttribute RETURN p, o;
produces an execution plan that is very similar (and therefore probably not much more expensive):
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | Ors(o:Attributes, o:ExtraAttribute) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
By the way, the first query in the answer by #BrunoPeres has a similar execution plan as well, but the Filter operation is very different. It is not clear which would be faster.
[UPDATE]
To answer your updated question: since you cannot have 2 back-to-back WHERE clauses, you can just add more terms to the already existing WHERE clause, like so:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE
(o:Attributes OR o:ExtraAttributes) AND
ANY(key in KEYS(o) WHERE TOLOWER(key) CONTAINS 'weight')
RETURN o;

Co-occurrences in Neo4j

I have a simple network in which each node has dui and name property and each relation has year and freq (frequency) property.
For example, if I wish to create ego network for node with dui = 'D000003', I use the following query (please note that I limit the number of results with WHERE clause):
MATCH (n {dui:'D000003'})<-[r]->(m) WHERE r.year = 2005 AND r.freq > 20 RETURN n.dui, m.dui;
and the corresponding result is:
+-----------------------+
| n.dui | m.dui |
+-----------------------+
| "D000003" | "D015995" |
| "D000003" | "D015169" |
| "D000003" | "D013552" |
| "D000003" | "D008460" |
| "D000003" | "D006801" |
| "D000003" | "D005516" |
| "D000003" | "D005506" |
| "D000003" | "D002418" |
| "D000003" | "D002417" |
| "D000003" | "D000818" |
+-----------------------+
Now I wonder how to get all relationships between nodes which are listed under the m.dui column; in other words I wish to produce co-occurrence graph for those nodes.
This should work:
MATCH (n { dui:'D000003' })-[r]-(m)
WHERE r.year = 2005 AND r.freq > 20
MATCH (n)-[rel]-(m)
RETURN n.dui, m.dui, COLLECT(rel) AS rels;
Note that I changed your weird (and, I believe, undocumented) <-[r]-> syntax to -[r]-, which means the directionality does not matter.

Neo4j/Cypher effective pagination with order by over large sub-graph

I have following simple relationship between (:User) nodes.
(:User)-[:FOLLOWS {timestamp}]->(:User)
If I paginate followers ordered by FOLLOWS.timestamp I'm running into performance problems when someone has millions of followers.
MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100
What is suggested approach for paginating big sets of data when ordering is required?
UPDATE
follower timestamp
---------------------------------------
id(1000000) 1455967905
id(999999) 1455967875
id(999998) 1455967234
id(999997) 1455967123
id(999996) 1455965321
id(999995) 1455964123
id(999994) 1455963645
id(999993) 1455963512
id(999992) 1455961343
....
id(2) 1455909382
id(1) 1455908432
I want to slice this list down using timestamp which set on :FOLLOWS relationship. If I want to return batches of 4 followers I take current timestamp first and return 4 most recent, then 1455967123 and 4 most recent and so on. In order to do this the whole list should be order by timestamp which results in performance issues on millions of records.
If you're looking for the most recent followers, i.e. where the timestamp is greater than a given time, it only has to traverse the most recent ones.
You can do it with (2) in 20ms
If you are really looking for the oldest (first) followers it makes sense to skip ahead and don't look at the timestamp of every of the million followers (which takes about 1s on my system, see (3)). If you skip ahead the time goes down to 230ms, see (1)
In general we can see that on my laptop it does 2M db-operations per core and second.
(1) Look at first / oldest followers
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f | follower |
+---------------------------------+
| :FOLLOWS[0]{ts:1} | Node[1]{} |
...
+---------------------------------+
10 rows
243 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, u | Literal(10); |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 499 | anon[142], anon[155], anon[158], anon[178], f, follower, u | anon[155]; anon[158]; anon[155].ts |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 0 | anon[142], anon[155], anon[158], f, follower, u | f; follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter | 1 | 499 | 0 | anon[142], f, follower, u | anon[142] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 1000 | 1000 | anon[142], f, follower, u | f; follower; f.ts < { AUTOINT2} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip | 1 | 1000 | 0 | f, follower, u | { AUTOINT1} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All) | 1 | 1000000 | 1000001 | f, follower, u | (u)<-[ f#12:FOLLOWS]-( follower#24) |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
Total database accesses: 1001501
(2) Look at most recent followers
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f | follower |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > { AUTOINT1}) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 16394 | 16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 32790
(3) Find oldest followers without skipping ahead
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f | follower |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < { AUTOINT1}) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 999498 | 999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 1998998

Neo4j: label vs. indexed property?

Suppose you're Twitter, and:
You have (:User) and (:Tweet) nodes;
Tweets can get flagged; and
You want to query the list of flagged tweets currently awaiting moderation.
You can either add a label for those tweets, e.g. :AwaitingModeration, or add and index a property, e.g. isAwaitingModeration = true|false.
Is one option inherently better than the other?
I know the best answer is probably to try and load test both :), but is there anything from Neo4j's implementation POV that makes one option more robust or suited for this kind of query?
Does it depend on the volume of tweets in this state at any given moment? If it's in the 10s vs. the 1000s, does that make a difference?
My impression is that labels are better suited for a large volume of nodes, whereas indexed properties are better for smaller volumes (ideally, unique nodes), but I'm not sure if that's actually true.
Thanks!
UPDATE: Follow up blog post published.
This is a common question when we model datasets for customers and a typical use case for Active/NonActive entities.
This is a little feedback about what I've experienced valid for Neo4j2.1.6 :
Point 1. You will not have difference in db accesses between matching on a label or on an indexed property and return the nodes
Point 2. The difference will be encountered when such nodes are at the end of a pattern, for example
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;
-
PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina#yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION153 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 3 | | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == { AUTOBOOL1}) |
| SimplePatternMatcher | 1 | 12 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
Total database accesses: 17
In this case, Cypher will not make use of the index :Post(published).
Thus the use of labels is more performant in the case you have a ActivePost label for e.g. :
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina#yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION130 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 1 | | hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher | 1 | 4 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 7
Point 3. Always use labels for positives, meaning for the case above, having a Draft label will force you to execute the following query :
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;
Meaning that Cypher will open each node label headers and do a filter on it.
Point 4. Avoid having the need to match on multiple labels
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina#yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION139 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 2 | | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher | 1 | 8 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
Total database accesses: 12
This will result in the same process for Cypher that on point 3.
Point 5. If possible, avoid the need to match on labels by having well typed named relationships
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts
-
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina#yahoo.com"} | 3 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION119 |
| EagerAggregation | 1 | 0 | | n |
| SimplePatternMatcher | 3 | 0 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 2
Will be more performant, because it will use all the power of the graph and just follow the relationships from the node resulting in no more db accesses than matching the user node and thus no filtering on labels.
This was my 0,02€

Neo4j and Cypher: Match nodes that have a single relationship to a target node

I'm trying to identify nodes that have only one relationship of a given type.
Imagine a graph of Route and Stop nodes. A Route may have 0 or more Stops, a Stop may be shared between multiple Routes, a Stop must always have at least 1 Route. I want to match and delete Stops that will be orphaned if a given Route is deleted.
Before anyone says anything, I know that it would be easier to just find stops without routes after the route is deleted, but that isn't an option. We're also not worried about deleting the routes here, just the stops.
Here's my query:
MATCH (r1:Route { id: {route_id} })-[rel1:HAS_STOP]->(s:Stop)
MATCH (r2:Route)-[rel2:HAS_STOP]->(s)
WITH s, COUNT(rel2) as c
WHERE c = 1
MATCH s-[rel2]-()
DELETE s, rel2
This works perfectly... but is there a better way? It feels like it could be more efficient but I'm not sure how.
EDIT
Here a query that matches only the nodes that will be orphaned without deleting the current route :
MATCH (route:Route {id:'99e08bdf-130f-3fca-8292-27d616fa025f'})
WITH route
OPTIONAL MATCH (route)-[r:HAS_STOP]->(s)
WHERE NOT EXISTS((route)--(s)<-[:HAS_STOP]-())
DELETE r,s
and the execution plan :
neo4j-sh (?)$ PROFILE MATCH (route:Route {id:'99e08bdf-130f-3fca-8292-27d616fa025f'})
> WITH route
> OPTIONAL MATCH (route)-[r:HAS_STOP]->(s)
> WHERE NOT EXISTS((route)--(s)<-[:HAS_STOP]-())
> DELETE r,s;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 2
Relationships deleted: 2
EmptyResult
|
+UpdateGraph
|
+Eager
|
+OptionalMatch
|
+SchemaIndex(1)
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex(1)
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 2 | 4 | | DeleteEntity; DeleteEntity |
| Eager | 2 | 0 | | |
| OptionalMatch | 2 | 0 | | |
| SchemaIndex(1) | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
| Filter | 2 | 0 | | NOT(nonEmpty(PathExpression((route)-[ UNNAMED140]-(s),(160)-[ UNNAMED145:HAS_STOP]->(s), true))) |
| SimplePatternMatcher | 2 | 0 | route, s, r | |
| SchemaIndex(1) | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
Total database accesses: 8
** OLD ANSWER **
I let it here for helping maybe others :
In your query, you're not deleting the route nor the relationships to the stops that will not be orphaned. You can do all in one go.
This is what I have as query for the same use case than you, I also compared the two execution plans on a test graph, each route has about 160 stops and 2 stops that will be orphaned after the route deletion, the graph is available here : http://graphgen.neoxygen.io/?graph=JPnvQWZcQW685m
My query :
MATCH (route:Route {id:'e70ea0d4-03e2-3ca4-afc0-dfdc1754868e'})
WITH route
MATCH (route)-[r:HAS_STOP]->(s)
WITH r, collect(s) as stops
DELETE r, route
WITH filter(x in stops WHERE NOT x--()) as orphans
UNWIND orphans as orphan
DELETE orphan
Here is my profiled query :
neo4j-sh (?)$ PROFILE MATCH (route:Route {id:'1c565ac4-b72b-37c3-be7f-a38f2a7f66a8'})
> WITH route
> MATCH (route)-[r:HAS_STOP]->(s)
> WITH route, r, collect(s) as stops
> DELETE r, route
> WITH filter(x in stops WHERE NOT x--()) as orphans
> UNWIND orphans as orphan
> DELETE orphan;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 2
Relationships deleted: 157
EmptyResult
|
+UpdateGraph(0)
|
+UNWIND
|
+ColumnFilter(0)
|
+Eager
|
+Extract
|
+UpdateGraph(1)
|
+ColumnFilter(1)
|
+EagerAggregation
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+--------------+------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+--------------+------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph(0) | 1 | 1 | | DeleteEntity |
| UNWIND | 1 | 0 | | |
| ColumnFilter(0) | 157 | 0 | | keep columns orphans |
| Eager | 157 | 0 | | |
| Extract | 157 | 0 | | orphans |
| UpdateGraph(1) | 157 | 158 | | DeleteEntity; DeleteEntity |
| ColumnFilter(1) | 157 | 0 | | keep columns route, r, stops |
| EagerAggregation | 157 | 0 | | route, r |
| SimplePatternMatcher | 157 | 0 | route, s, r | |
| SchemaIndex | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
+----------------------+------+--------+--------------+------------------------------+
Total database accesses: 161
With your query :
I slightly modified your query to make use of schema indexes
And this is the Execution plan with your query, the difference in db accesses is quite high
PROFILE MATCH (r1:Route { id: '1c565ac4-b72b-37c3-be7f-a38f2a7f66a8' })
> WITH r1
> MATCH (r1)-[rel1:HAS_STOP]->(s:Stop)
> MATCH (r2:Route)-[rel2:HAS_STOP]->(s)
> WITH s, COUNT(rel2) as c
> WHERE c = 1
> MATCH s-[rel2]-()
> DELETE s, rel2;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
Relationships deleted: 1
EmptyResult
|
+UpdateGraph
|
+Eager
|
+SimplePatternMatcher(0)
|
+Filter(0)
|
+ColumnFilter
|
+EagerAggregation
|
+Filter(1)
|
+SimplePatternMatcher(1)
|
+Filter(2)
|
+SimplePatternMatcher(2)
|
+SchemaIndex
+-------------------------+------+--------+-----------------------+-----------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+-------------------------+------+--------+-----------------------+-----------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 1 | 2 | | DeleteEntity; DeleteEntity |
| Eager | 1 | 0 | | |
| SimplePatternMatcher(0) | 1 | 0 | UNNAMED200, s, rel2 | |
| Filter(0) | 1 | 0 | | c == { AUTOINT1} |
| ColumnFilter | 157 | 0 | | keep columns s, c |
| EagerAggregation | 157 | 0 | | s |
| Filter(1) | 4797 | 4797 | | hasLabel(r2:Route(4)) |
| SimplePatternMatcher(1) | 4797 | 4797 | r2, s, rel2 | |
| Filter(2) | 157 | 157 | | hasLabel(s:Stop(3)) |
| SimplePatternMatcher(2) | 157 | 157 | r1, s, rel1 | |
| SchemaIndex | 1 | 2 | r1, r1 | { AUTOSTRING0}; :Route(id) |
+-------------------------+------+--------+-----------------------+-----------------------------+
Total database accesses: 9912

Resources