Co-occurrences in Neo4j

Co-occurrences in Neo4j - neo4j

I have a simple network in which each node has dui and name property and each relation has year and freq (frequency) property.
For example, if I wish to create ego network for node with dui = 'D000003', I use the following query (please note that I limit the number of results with WHERE clause):
MATCH (n {dui:'D000003'})<-[r]->(m) WHERE r.year = 2005 AND r.freq > 20 RETURN n.dui, m.dui;
and the corresponding result is:
+-----------------------+
| n.dui | m.dui |
+-----------------------+
| "D000003" | "D015995" |
| "D000003" | "D015169" |
| "D000003" | "D013552" |
| "D000003" | "D008460" |
| "D000003" | "D006801" |
| "D000003" | "D005516" |
| "D000003" | "D005506" |
| "D000003" | "D002418" |
| "D000003" | "D002417" |
| "D000003" | "D000818" |
+-----------------------+
Now I wonder how to get all relationships between nodes which are listed under the m.dui column; in other words I wish to produce co-occurrence graph for those nodes.

This should work:
MATCH (n { dui:'D000003' })-[r]-(m)
WHERE r.year = 2005 AND r.freq > 20
MATCH (n)-[rel]-(m)
RETURN n.dui, m.dui, COLLECT(rel) AS rels;
Note that I changed your weird (and, I believe, undocumented) <-[r]-> syntax to -[r]-, which means the directionality does not matter.

Related

Forcing cost planner to start from a specific index seek

My cypher query
EXPLAIN MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
produces the following execution plan
--------------------------------------------+
| Operator | Estimated Rows | Identifiers | Other |
+-------------------+----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults | 12 | count(tx) | |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +EagerAggregation | 12 | count(tx) | |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter | 136 | anon[16], b, tx | AndedPropertyInequalities(Variable(b),Property(Variable(b),PropertyKeyName(time)),GreaterThanOrEqual(Property(Variable(b),PropertyKeyName(time)),Parameter( AUTOINT2,Integer)), LessThan(Property(Variable(b),PropertyKeyName(time)),Parameter( AUTOINT1,Integer))) |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Expand(All) | 9052 | anon[16], b, tx | (tx)-[anon[16]:INCLUDED_IN]->(b) |
| | +----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +NodeIndexSeek | 9052 | tx | :Transaction(pstype) |
+-------------------+----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
which executes way too slow because the first NodeIndexSeekByRange returns tens of millions of nodes instead of 9052. Using NodeIndexSeekByRange on b:Block(time) would produce around 600 nodes.
I have tried forcing the execution plan to start from b:Block(time), but instead it still keeps using NodeIndexSeek on tx:Transaction(pstype):
EXPLAIN MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
USING INDEX b:Block(time)
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
produces
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
| Operator | Estimated Rows | Identifiers | Other |
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
| +ProduceResults | 12 | count(tx) | |
| | +----------------+-----------------+--------------------------------------------------------------+
| +EagerAggregation | 12 | count(tx) | |
| | +----------------+-----------------+--------------------------------------------------------------+
| +NodeHashJoin | 136 | anon[16], b, tx | b |
| |\ +----------------+-----------------+--------------------------------------------------------------+
| | +NodeIndexSeekByRange | 14703 | b | :Block(time) >= { AUTOINT2} AND :Block(time) < { AUTOINT1} |
| | +----------------+-----------------+--------------------------------------------------------------+
| +Expand(All) | 9052 | anon[16], b, tx | (tx)-[anon[16]:INCLUDED_IN]->(b) |
| | +----------------+-----------------+--------------------------------------------------------------+
| +NodeIndexSeek | 9052 | tx | :Transaction(pstype) |
+-------------------------+----------------+-----------------+--------------------------------------------------------------+
The only way I have gotten it to work fast is by using the rule planner: (multiple orders of magnitude faster)
CYPHER planner=rule MATCH (b:Block)
WHERE 1540512000 <= b.time < 1540598400
WITH b
MATCH (b)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
RETURN count(tx);
Is there a way to make it work when using the cost planner?
Both :Block(time) and :Transaction(pstype) are indexed.

You could try using a join hint on tx along with your index hint, which should ensure you only expand from one direction:
EXPLAIN
MATCH (b:Block)<-[:INCLUDED_IN]-(tx:Transaction {pstype: 0})
USING INDEX b:Block(time)
USING JOIN ON tx
WHERE 1540512000 <= b.time < 1540598400
RETURN count(tx);
Alternately you could restructure your query a bit so the tx node isn't initially part of the pattern, but enforced in the WHERE clause. You'll need to split the MATCH in 2, but I don't think you'll need any planner hints:
EXPLAIN
MATCH (tx:Transaction {pstype: 0})
MATCH (b:Block)<-[:INCLUDED_IN]-(x)
WHERE 1540512000 <= b.time < 1540598400
AND x = tx
RETURN count(tx);
EDIT
Okay, let's try another approach then:
EXPLAIN
MATCH (b:Block)<-[:INCLUDED_IN]-(x)
WHERE 1540512000 <= b.time < 1540598400
AND x.pstype = 0 // AND 'Transaction' in labels(x)
RETURN count(tx);
If we leave off the label then it can't use an indexed lookup. If there are other nodes besides :Transaction nodes that have a pstype property, you could try uncommenting the line where we use an alternate way to see if the node has that label (I don't think this will use an index lookup, but not completely sure).
Another alternative (unsure if this will work) is to use pattern comprehension to get a list of results from a pattern (after the initial match is found to b) and summing the sizes of the results:
EXPLAIN
MATCH (b:Block)
WHERE 1540512000 <= b.time < 1540598400
RETURN sum(size([(b)<-[:INCLUDED_IN]-(x:Transaction) WHERE x.pstype = 0 | x])) as count

How to match node labels using OR?

match (p:Product {id:'5116003'})-[r]->(o:Attributes|ExtraAttribute) return p, o
How to match two possible node labels in such a query?
Per cybersam's suggestion, I changed to the follwoing:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE o:Attributes OR o:ExtraAttributes
**WHERE any(key in keys(o) WHERE toLower(key) contains 'weight')**
return o
Now I need to add the 2nd 'where' clause. How to modify that?

You can try using any() function:
match (p:Product {id:'5116003'})-[r]->(o)
where any (label in labels(o) where label in ['Attributes', 'ExtraAttribute'])
return p, o
Also, if you have APOC procedures, you can use apoc.path.expand path expander procedure that expands from start node following the given relationships from min to max-level adhering to the label filters.
match (p:Product {id:'5116003'})
call apoc.path.expand(p, null,"+Attributes|ExtraAttribute",0,1) yield path
with nodes(path) as nodes
// return p and o nodes
return nodes[0], nodes[1]
See more here.

These two single-label forms of your query:
MATCH (p:Product {id:'5116003'})-->(o:Attributes) RETURN p, o;
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes RETURN p, o;
produce the same execution plan, as follows (I assume that there is an index on :Product(id)):
+-----------------+----------------+------+---------+------------------+--------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+--------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+--------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | o:Attributes |
| | +----------------+------+---------+------------------+--------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+--------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+--------------+
This two-label form of the second query above:
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes OR o: ExtraAttribute RETURN p, o;
produces an execution plan that is very similar (and therefore probably not much more expensive):
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | Ors(o:Attributes, o:ExtraAttribute) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
By the way, the first query in the answer by #BrunoPeres has a similar execution plan as well, but the Filter operation is very different. It is not clear which would be faster.
[UPDATE]
To answer your updated question: since you cannot have 2 back-to-back WHERE clauses, you can just add more terms to the already existing WHERE clause, like so:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE
(o:Attributes OR o:ExtraAttributes) AND
ANY(key in KEYS(o) WHERE TOLOWER(key) CONTAINS 'weight')
RETURN o;

Finding root of a tree in a directed graph

I have a tree structure like node(1)->node(2)->node(3). I have name as an property used to retrieve a node.
Given a node say node(3), i wanna retrieve node(1).
Query tried :
MATCH (p:Node)-[:HAS*]->(c:Node) WHERE c.name = "node 3" RETURN p LIMIT 5
But, not able to get node 1.

Your query will not only return "node 1", but it should at least include one path containing it. It's possible to filter the paths to only get the one traversing all the way to the root, however:
MATCH (c:Node {name: "node 3"})<-[:HAS*0..]-(p:Node)
// The root does not have any incoming relationship
WHERE NOT (p)<-[:HAS]-()
RETURN p
Note the use of the 0 length, which matches all cases, including the one where the start node is the root.
Fun fact: even if you have an index on Node:name, it won't be used (unless you're using Neo4j 3.1, where it seems to be fixed since 3.1 Beta2 at least) and you have to explicitly specify it.
MATCH (c:Node {name: "node 3"})<-[:HAS*0..]-(p:Node)
USING INDEX c:Node(name)
WHERE NOT (p)<-[:HAS]-()
RETURN p
Using PROFILE on the first query (with a numerical id property instead of name):
+-----------------------+----------------+------+---------+-------------------------+----------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+------+---------+-------------------------+----------------------+
| +ProduceResults | 0 | 1 | 0 | p | p |
| | +----------------+------+---------+-------------------------+----------------------+
| +AntiSemiApply | 0 | 1 | 0 | anon[23], c -- p | |
| |\ +----------------+------+---------+-------------------------+----------------------+
| | +Expand(All) | 1 | 0 | 3 | anon[58], anon[67] -- p | (p)<-[:HAS]-() |
| | | +----------------+------+---------+-------------------------+----------------------+
| | +Argument | 1 | 3 | 0 | p | |
| | +----------------+------+---------+-------------------------+----------------------+
| +Filter | 1 | 3 | 3 | anon[23], c, p | p:Node |
| | +----------------+------+---------+-------------------------+----------------------+
| +VarLengthExpand(All) | 1 | 3 | 5 | anon[23], p -- c | (c)<-[:HAS*]-(p) |
| | +----------------+------+---------+-------------------------+----------------------+
| +Filter | 1 | 1 | 3 | c | c.id == { AUTOINT0} |
| | +----------------+------+---------+-------------------------+----------------------+
| +NodeByLabelScan | 3 | 3 | 4 | c | :Node |
+-----------------------+----------------+------+---------+-------------------------+----------------------+
Total database accesses: 18
and on the second one:
+-----------------------+----------------+------+---------+-------------------------+------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+------+---------+-------------------------+------------------+
| +ProduceResults | 0 | 1 | 0 | p | p |
| | +----------------+------+---------+-------------------------+------------------+
| +AntiSemiApply | 0 | 1 | 0 | anon[23], c -- p | |
| |\ +----------------+------+---------+-------------------------+------------------+
| | +Expand(All) | 1 | 0 | 3 | anon[81], anon[90] -- p | (p)<-[:HAS]-() |
| | | +----------------+------+---------+-------------------------+------------------+
| | +Argument | 1 | 3 | 0 | p | |
| | +----------------+------+---------+-------------------------+------------------+
| +Filter | 1 | 3 | 3 | anon[23], c, p | p:Node |
| | +----------------+------+---------+-------------------------+------------------+
| +VarLengthExpand(All) | 1 | 3 | 5 | anon[23], p -- c | (c)<-[:HAS*]-(p) |
| | +----------------+------+---------+-------------------------+------------------+
| +NodeUniqueIndexSeek | 1 | 1 | 2 | c | :Node(id) |
+-----------------------+----------------+------+---------+-------------------------+------------------+
Total database accesses: 13

Neo4j: label vs. indexed property?

Suppose you're Twitter, and:
You have (:User) and (:Tweet) nodes;
Tweets can get flagged; and
You want to query the list of flagged tweets currently awaiting moderation.
You can either add a label for those tweets, e.g. :AwaitingModeration, or add and index a property, e.g. isAwaitingModeration = true|false.
Is one option inherently better than the other?
I know the best answer is probably to try and load test both :), but is there anything from Neo4j's implementation POV that makes one option more robust or suited for this kind of query?
Does it depend on the volume of tweets in this state at any given moment? If it's in the 10s vs. the 1000s, does that make a difference?
My impression is that labels are better suited for a large volume of nodes, whereas indexed properties are better for smaller volumes (ideally, unique nodes), but I'm not sure if that's actually true.
Thanks!

UPDATE: Follow up blog post published.
This is a common question when we model datasets for customers and a typical use case for Active/NonActive entities.
This is a little feedback about what I've experienced valid for Neo4j2.1.6 :
Point 1. You will not have difference in db accesses between matching on a label or on an indexed property and return the nodes
Point 2. The difference will be encountered when such nodes are at the end of a pattern, for example
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;
-
PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina#yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION153 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 3 | | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == { AUTOBOOL1}) |
| SimplePatternMatcher | 1 | 12 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
Total database accesses: 17
In this case, Cypher will not make use of the index :Post(published).
Thus the use of labels is more performant in the case you have a ActivePost label for e.g. :
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina#yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION130 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 1 | | hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher | 1 | 4 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 7
Point 3. Always use labels for positives, meaning for the case above, having a Draft label will force you to execute the following query :
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;
Meaning that Cypher will open each node label headers and do a filter on it.
Point 4. Avoid having the need to match on multiple labels
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina#yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION139 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 2 | | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher | 1 | 8 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
Total database accesses: 12
This will result in the same process for Cypher that on point 3.
Point 5. If possible, avoid the need to match on labels by having well typed named relationships
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts
-
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina#yahoo.com"} | 3 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION119 |
| EagerAggregation | 1 | 0 | | n |
| SimplePatternMatcher | 3 | 0 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 2
Will be more performant, because it will use all the power of the graph and just follow the relationships from the node resulting in no more db accesses than matching the user node and thus no filtering on labels.
This was my 0,02€

Neo4j and Cypher: Match nodes that have a single relationship to a target node

I'm trying to identify nodes that have only one relationship of a given type.
Imagine a graph of Route and Stop nodes. A Route may have 0 or more Stops, a Stop may be shared between multiple Routes, a Stop must always have at least 1 Route. I want to match and delete Stops that will be orphaned if a given Route is deleted.
Before anyone says anything, I know that it would be easier to just find stops without routes after the route is deleted, but that isn't an option. We're also not worried about deleting the routes here, just the stops.
Here's my query:
MATCH (r1:Route { id: {route_id} })-[rel1:HAS_STOP]->(s:Stop)
MATCH (r2:Route)-[rel2:HAS_STOP]->(s)
WITH s, COUNT(rel2) as c
WHERE c = 1
MATCH s-[rel2]-()
DELETE s, rel2
This works perfectly... but is there a better way? It feels like it could be more efficient but I'm not sure how.

EDIT
Here a query that matches only the nodes that will be orphaned without deleting the current route :
MATCH (route:Route {id:'99e08bdf-130f-3fca-8292-27d616fa025f'})
WITH route
OPTIONAL MATCH (route)-[r:HAS_STOP]->(s)
WHERE NOT EXISTS((route)--(s)<-[:HAS_STOP]-())
DELETE r,s
and the execution plan :
neo4j-sh (?)$ PROFILE MATCH (route:Route {id:'99e08bdf-130f-3fca-8292-27d616fa025f'})
> WITH route
> OPTIONAL MATCH (route)-[r:HAS_STOP]->(s)
> WHERE NOT EXISTS((route)--(s)<-[:HAS_STOP]-())
> DELETE r,s;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 2
Relationships deleted: 2
EmptyResult
|
+UpdateGraph
|
+Eager
|
+OptionalMatch
|
+SchemaIndex(1)
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex(1)
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 2 | 4 | | DeleteEntity; DeleteEntity |
| Eager | 2 | 0 | | |
| OptionalMatch | 2 | 0 | | |
| SchemaIndex(1) | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
| Filter | 2 | 0 | | NOT(nonEmpty(PathExpression((route)-[ UNNAMED140]-(s),(160)-[ UNNAMED145:HAS_STOP]->(s), true))) |
| SimplePatternMatcher | 2 | 0 | route, s, r | |
| SchemaIndex(1) | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
Total database accesses: 8
** OLD ANSWER **
I let it here for helping maybe others :
In your query, you're not deleting the route nor the relationships to the stops that will not be orphaned. You can do all in one go.
This is what I have as query for the same use case than you, I also compared the two execution plans on a test graph, each route has about 160 stops and 2 stops that will be orphaned after the route deletion, the graph is available here : http://graphgen.neoxygen.io/?graph=JPnvQWZcQW685m
My query :
MATCH (route:Route {id:'e70ea0d4-03e2-3ca4-afc0-dfdc1754868e'})
WITH route
MATCH (route)-[r:HAS_STOP]->(s)
WITH r, collect(s) as stops
DELETE r, route
WITH filter(x in stops WHERE NOT x--()) as orphans
UNWIND orphans as orphan
DELETE orphan
Here is my profiled query :
neo4j-sh (?)$ PROFILE MATCH (route:Route {id:'1c565ac4-b72b-37c3-be7f-a38f2a7f66a8'})
> WITH route
> MATCH (route)-[r:HAS_STOP]->(s)
> WITH route, r, collect(s) as stops
> DELETE r, route
> WITH filter(x in stops WHERE NOT x--()) as orphans
> UNWIND orphans as orphan
> DELETE orphan;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 2
Relationships deleted: 157
EmptyResult
|
+UpdateGraph(0)
|
+UNWIND
|
+ColumnFilter(0)
|
+Eager
|
+Extract
|
+UpdateGraph(1)
|
+ColumnFilter(1)
|
+EagerAggregation
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+--------------+------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+--------------+------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph(0) | 1 | 1 | | DeleteEntity |
| UNWIND | 1 | 0 | | |
| ColumnFilter(0) | 157 | 0 | | keep columns orphans |
| Eager | 157 | 0 | | |
| Extract | 157 | 0 | | orphans |
| UpdateGraph(1) | 157 | 158 | | DeleteEntity; DeleteEntity |
| ColumnFilter(1) | 157 | 0 | | keep columns route, r, stops |
| EagerAggregation | 157 | 0 | | route, r |
| SimplePatternMatcher | 157 | 0 | route, s, r | |
| SchemaIndex | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
+----------------------+------+--------+--------------+------------------------------+
Total database accesses: 161
With your query :
I slightly modified your query to make use of schema indexes
And this is the Execution plan with your query, the difference in db accesses is quite high
PROFILE MATCH (r1:Route { id: '1c565ac4-b72b-37c3-be7f-a38f2a7f66a8' })
> WITH r1
> MATCH (r1)-[rel1:HAS_STOP]->(s:Stop)
> MATCH (r2:Route)-[rel2:HAS_STOP]->(s)
> WITH s, COUNT(rel2) as c
> WHERE c = 1
> MATCH s-[rel2]-()
> DELETE s, rel2;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
Relationships deleted: 1
EmptyResult
|
+UpdateGraph
|
+Eager
|
+SimplePatternMatcher(0)
|
+Filter(0)
|
+ColumnFilter
|
+EagerAggregation
|
+Filter(1)
|
+SimplePatternMatcher(1)
|
+Filter(2)
|
+SimplePatternMatcher(2)
|
+SchemaIndex
+-------------------------+------+--------+-----------------------+-----------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+-------------------------+------+--------+-----------------------+-----------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 1 | 2 | | DeleteEntity; DeleteEntity |
| Eager | 1 | 0 | | |
| SimplePatternMatcher(0) | 1 | 0 | UNNAMED200, s, rel2 | |
| Filter(0) | 1 | 0 | | c == { AUTOINT1} |
| ColumnFilter | 157 | 0 | | keep columns s, c |
| EagerAggregation | 157 | 0 | | s |
| Filter(1) | 4797 | 4797 | | hasLabel(r2:Route(4)) |
| SimplePatternMatcher(1) | 4797 | 4797 | r2, s, rel2 | |
| Filter(2) | 157 | 157 | | hasLabel(s:Stop(3)) |
| SimplePatternMatcher(2) | 157 | 157 | r1, s, rel1 | |
| SchemaIndex | 1 | 2 | r1, r1 | { AUTOSTRING0}; :Route(id) |
+-------------------------+------+--------+-----------------------+-----------------------------+
Total database accesses: 9912

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Co-occurrences in Neo4j - neo4j

This should work: MATCH (n { dui:'D000003' })-[r]-(m) WHERE r.year = 2005 AND r.freq > 20 MATCH (n)-[rel]-(m) RETURN n.dui, m.dui, COLLECT(rel) AS rels; Note that I changed your weird (and, I believe, undocumented) <-[r]-> syntax to -[r]-, which means the directionality does not matter.

Related

Forcing cost planner to start from a specific index seek

How to match node labels using OR?

Finding root of a tree in a directed graph

Neo4j: label vs. indexed property?

Neo4j and Cypher: Match nodes that have a single relationship to a target node

Categories

Resources