Finding root of a tree in a directed graph - neo4j

I have a tree structure like node(1)->node(2)->node(3). I have name as an property used to retrieve a node.
Given a node say node(3), i wanna retrieve node(1).
Query tried :
MATCH (p:Node)-[:HAS*]->(c:Node) WHERE c.name = "node 3" RETURN p LIMIT 5
But, not able to get node 1.

Your query will not only return "node 1", but it should at least include one path containing it. It's possible to filter the paths to only get the one traversing all the way to the root, however:
MATCH (c:Node {name: "node 3"})<-[:HAS*0..]-(p:Node)
// The root does not have any incoming relationship
WHERE NOT (p)<-[:HAS]-()
RETURN p
Note the use of the 0 length, which matches all cases, including the one where the start node is the root.
Fun fact: even if you have an index on Node:name, it won't be used (unless you're using Neo4j 3.1, where it seems to be fixed since 3.1 Beta2 at least) and you have to explicitly specify it.
MATCH (c:Node {name: "node 3"})<-[:HAS*0..]-(p:Node)
USING INDEX c:Node(name)
WHERE NOT (p)<-[:HAS]-()
RETURN p
Using PROFILE on the first query (with a numerical id property instead of name):
+-----------------------+----------------+------+---------+-------------------------+----------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+------+---------+-------------------------+----------------------+
| +ProduceResults | 0 | 1 | 0 | p | p |
| | +----------------+------+---------+-------------------------+----------------------+
| +AntiSemiApply | 0 | 1 | 0 | anon[23], c -- p | |
| |\ +----------------+------+---------+-------------------------+----------------------+
| | +Expand(All) | 1 | 0 | 3 | anon[58], anon[67] -- p | (p)<-[:HAS]-() |
| | | +----------------+------+---------+-------------------------+----------------------+
| | +Argument | 1 | 3 | 0 | p | |
| | +----------------+------+---------+-------------------------+----------------------+
| +Filter | 1 | 3 | 3 | anon[23], c, p | p:Node |
| | +----------------+------+---------+-------------------------+----------------------+
| +VarLengthExpand(All) | 1 | 3 | 5 | anon[23], p -- c | (c)<-[:HAS*]-(p) |
| | +----------------+------+---------+-------------------------+----------------------+
| +Filter | 1 | 1 | 3 | c | c.id == { AUTOINT0} |
| | +----------------+------+---------+-------------------------+----------------------+
| +NodeByLabelScan | 3 | 3 | 4 | c | :Node |
+-----------------------+----------------+------+---------+-------------------------+----------------------+
Total database accesses: 18
and on the second one:
+-----------------------+----------------+------+---------+-------------------------+------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+------+---------+-------------------------+------------------+
| +ProduceResults | 0 | 1 | 0 | p | p |
| | +----------------+------+---------+-------------------------+------------------+
| +AntiSemiApply | 0 | 1 | 0 | anon[23], c -- p | |
| |\ +----------------+------+---------+-------------------------+------------------+
| | +Expand(All) | 1 | 0 | 3 | anon[81], anon[90] -- p | (p)<-[:HAS]-() |
| | | +----------------+------+---------+-------------------------+------------------+
| | +Argument | 1 | 3 | 0 | p | |
| | +----------------+------+---------+-------------------------+------------------+
| +Filter | 1 | 3 | 3 | anon[23], c, p | p:Node |
| | +----------------+------+---------+-------------------------+------------------+
| +VarLengthExpand(All) | 1 | 3 | 5 | anon[23], p -- c | (c)<-[:HAS*]-(p) |
| | +----------------+------+---------+-------------------------+------------------+
| +NodeUniqueIndexSeek | 1 | 1 | 2 | c | :Node(id) |
+-----------------------+----------------+------+---------+-------------------------+------------------+
Total database accesses: 13

Related

Cypher query aggregate sum of values

I have a Cypher query that shows the following output:
+----------------
| usid | count |
+----------------
| "000" | 1 |
| "000" | 0 |
| "000" | 0 |
| "001" | 1 |
| "001" | 1 |
| "001" | 0 |
| "002" | 2 |
| "002" | 2 |
| "002" | 0 |
| "003" | 4 |
| "003" | 2 |
| "003" | 2 |
| "004" | 4 |
| "004" | 4 |
| "004" | 4 |
+----------------
How can I get the below result with the condition SUM(count) <= 9.
+----------------
| usid | count |
+----------------
| "000" | 1 |
| "001" | 2 |
| "002" | 4 |
| "003" | 8 |
+----------------
Note: I have used the below query to get the 1st table data.
MATCH (us:USER)
WITH us
WHERE us.count <= 4
RETURN us.id as usid, us.count as count;
I don't know how you get your original data, so I will just use a WITH clause and assume the data is there:
// original data
WITH usid, count
// aggregate and filter
WITH usid, sum(count) as new_count
WHERE new_count <= 9
RETURN usid, new_count
Based on the updated question, the new query would look like:
MATCH (us:USER)
WHERE us.count <= 4
WITH us.id as usid, sum(us.count) as count
WHERE new_count <= 9
RETURN usid, count
˙˙˙

DISTINCT, MAX and list value from join table

How to combine distinct and max within join table below?
Table_details_usage
UID | VE_NO | START_MILEAGE | END_MILEAGE
------------------------------------------------
1 | ASD | 410000 | 410500
2 | JWQ | 212000 | 212350
3 | WYS | 521000 | 521150
4 | JWQ | 212360 | 212400
5 | ASD | 410520 | 410600
Table_service_schedule
SID | VE_NO | SV_ONMILEAGE | SV_NEXTMILEAGE
------------------------------------------------
1 | ASD | 400010 | 410010
2 | JWQ | 212120 | 222120
3 | WYS | 511950 | 521950
4 | JWQ | 212300 | 222300
5 | ASD | 410510 | 420510
How to get display as below (only max value)?
Get Max value from Table_service_schedule (SV_NEXTMILEAGE) and Get Max value from Table_details_usage (END_MILEAGE)
SID | VE_NO | SV_NEXTMILEAGE | END_MILEAGE
--------------------------------------------
5 | ASD | 420510 | 410600
4 | JWQ | 222300 | 212400
3 | WYS | 521950 | 521150
Something in the lines of:
SELECT
SID,
VE_NO,
SV_NEXTMILEAGE,
(select max(END_MILEAGE) from Table_details_usage d where d.VE_NO = s.VE_NO) END_MILEAGE
FROM Table_service_schedule s
WHERE SID = (SELECT max(SID) FROM Table_service_schedule s2 WHERE s2.VE_NO = s.VE_NO)
Probably could need to change the direct value ov SV_NEXTMILEAGE to max as well if the id:s aren't in order...

"Extract" intervals from series in Google Sheets

If I in Google Sheets have a series defined as
[29060, 29062, 29331, 29332, 29333, 29334, 29335, 29336, 29337, 29338, 29339, 29340, 29341,
29342, 29372, 29373].
How do I make them line up in intervals like this?
|To |From |
|29060 |29062 |
|29331 |29342 |
|29372 |29373 |
I can't find any good answers for this anywhere. Please, help!
Data/Formulas
A1:
29060, 29062, 29331, 29332, 29333, 29334, 29335, 29336, 29337, 29338, 29339, 29340, 29341,
29342, 29372, 29373
B1: =transpose(split(A1,",")). Converts the input text is an a vertical array.
C1: =FILTER(B1:B16,mod(ROW(B1:B16),2)<>0). Returns values in odd rows.
D1: =FILTER(B1:B16,mod(ROW(B1:B16),2)=0). Returns values in even rows.
E1: =ArrayFormula(FILTER(C1:C8,{TRUE();C2:C8<>D1:D7+1})). Returns values that start a range.
F1: =ArrayFormula(FILTER(D1:D8,{D1:D7+2<>D2:D8;TRUE()})). Returns values that end a range.
Result
Note: A1 values are not shown for readability.
+----+---+-------+-------+-------+-------+-------+
| | A | B | C | D | E | F |
+----+---+-------+-------+-------+-------+-------+
| 1 | | 29060 | 29060 | 29062 | 29060 | 29062 |
| 2 | | 29062 | 29331 | 29332 | 29331 | 29342 |
| 3 | | 29331 | 29333 | 29334 | 29372 | 29373 |
| 4 | | 29332 | 29335 | 29336 | | |
| 5 | | 29333 | 29337 | 29338 | | |
| 6 | | 29334 | 29339 | 29340 | | |
| 7 | | 29335 | 29341 | 29342 | | |
| 8 | | 29336 | 29372 | 29373 | | |
| 9 | | 29337 | | | | |
| 10 | | 29338 | | | | |
| 11 | | 29339 | | | | |
| 12 | | 29340 | | | | |
| 13 | | 29341 | | | | |
| 14 | | 29342 | | | | |
| 15 | | 29372 | | | | |
| 16 | | 29373 | | | | |
+----+---+-------+-------+-------+-------+-------+

Neo4j/Cypher effective pagination with order by over large sub-graph

I have following simple relationship between (:User) nodes.
(:User)-[:FOLLOWS {timestamp}]->(:User)
If I paginate followers ordered by FOLLOWS.timestamp I'm running into performance problems when someone has millions of followers.
MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100
What is suggested approach for paginating big sets of data when ordering is required?
UPDATE
follower timestamp
---------------------------------------
id(1000000) 1455967905
id(999999) 1455967875
id(999998) 1455967234
id(999997) 1455967123
id(999996) 1455965321
id(999995) 1455964123
id(999994) 1455963645
id(999993) 1455963512
id(999992) 1455961343
....
id(2) 1455909382
id(1) 1455908432
I want to slice this list down using timestamp which set on :FOLLOWS relationship. If I want to return batches of 4 followers I take current timestamp first and return 4 most recent, then 1455967123 and 4 most recent and so on. In order to do this the whole list should be order by timestamp which results in performance issues on millions of records.
If you're looking for the most recent followers, i.e. where the timestamp is greater than a given time, it only has to traverse the most recent ones.
You can do it with (2) in 20ms
If you are really looking for the oldest (first) followers it makes sense to skip ahead and don't look at the timestamp of every of the million followers (which takes about 1s on my system, see (3)). If you skip ahead the time goes down to 230ms, see (1)
In general we can see that on my laptop it does 2M db-operations per core and second.
(1) Look at first / oldest followers
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f | follower |
+---------------------------------+
| :FOLLOWS[0]{ts:1} | Node[1]{} |
...
+---------------------------------+
10 rows
243 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, u | Literal(10); |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 499 | anon[142], anon[155], anon[158], anon[178], f, follower, u | anon[155]; anon[158]; anon[155].ts |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 0 | anon[142], anon[155], anon[158], f, follower, u | f; follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter | 1 | 499 | 0 | anon[142], f, follower, u | anon[142] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 1000 | 1000 | anon[142], f, follower, u | f; follower; f.ts < { AUTOINT2} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip | 1 | 1000 | 0 | f, follower, u | { AUTOINT1} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All) | 1 | 1000000 | 1000001 | f, follower, u | (u)<-[ f#12:FOLLOWS]-( follower#24) |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
Total database accesses: 1001501
(2) Look at most recent followers
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f | follower |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > { AUTOINT1}) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 16394 | 16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 32790
(3) Find oldest followers without skipping ahead
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f | follower |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < { AUTOINT1}) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 999498 | 999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 1998998

Neo4j and Cypher: Match nodes that have a single relationship to a target node

I'm trying to identify nodes that have only one relationship of a given type.
Imagine a graph of Route and Stop nodes. A Route may have 0 or more Stops, a Stop may be shared between multiple Routes, a Stop must always have at least 1 Route. I want to match and delete Stops that will be orphaned if a given Route is deleted.
Before anyone says anything, I know that it would be easier to just find stops without routes after the route is deleted, but that isn't an option. We're also not worried about deleting the routes here, just the stops.
Here's my query:
MATCH (r1:Route { id: {route_id} })-[rel1:HAS_STOP]->(s:Stop)
MATCH (r2:Route)-[rel2:HAS_STOP]->(s)
WITH s, COUNT(rel2) as c
WHERE c = 1
MATCH s-[rel2]-()
DELETE s, rel2
This works perfectly... but is there a better way? It feels like it could be more efficient but I'm not sure how.
EDIT
Here a query that matches only the nodes that will be orphaned without deleting the current route :
MATCH (route:Route {id:'99e08bdf-130f-3fca-8292-27d616fa025f'})
WITH route
OPTIONAL MATCH (route)-[r:HAS_STOP]->(s)
WHERE NOT EXISTS((route)--(s)<-[:HAS_STOP]-())
DELETE r,s
and the execution plan :
neo4j-sh (?)$ PROFILE MATCH (route:Route {id:'99e08bdf-130f-3fca-8292-27d616fa025f'})
> WITH route
> OPTIONAL MATCH (route)-[r:HAS_STOP]->(s)
> WHERE NOT EXISTS((route)--(s)<-[:HAS_STOP]-())
> DELETE r,s;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 2
Relationships deleted: 2
EmptyResult
|
+UpdateGraph
|
+Eager
|
+OptionalMatch
|
+SchemaIndex(1)
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex(1)
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 2 | 4 | | DeleteEntity; DeleteEntity |
| Eager | 2 | 0 | | |
| OptionalMatch | 2 | 0 | | |
| SchemaIndex(1) | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
| Filter | 2 | 0 | | NOT(nonEmpty(PathExpression((route)-[ UNNAMED140]-(s),(160)-[ UNNAMED145:HAS_STOP]->(s), true))) |
| SimplePatternMatcher | 2 | 0 | route, s, r | |
| SchemaIndex(1) | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
+----------------------+------+--------+--------------+----------------------------------------------------------------------------------------------------+
Total database accesses: 8
** OLD ANSWER **
I let it here for helping maybe others :
In your query, you're not deleting the route nor the relationships to the stops that will not be orphaned. You can do all in one go.
This is what I have as query for the same use case than you, I also compared the two execution plans on a test graph, each route has about 160 stops and 2 stops that will be orphaned after the route deletion, the graph is available here : http://graphgen.neoxygen.io/?graph=JPnvQWZcQW685m
My query :
MATCH (route:Route {id:'e70ea0d4-03e2-3ca4-afc0-dfdc1754868e'})
WITH route
MATCH (route)-[r:HAS_STOP]->(s)
WITH r, collect(s) as stops
DELETE r, route
WITH filter(x in stops WHERE NOT x--()) as orphans
UNWIND orphans as orphan
DELETE orphan
Here is my profiled query :
neo4j-sh (?)$ PROFILE MATCH (route:Route {id:'1c565ac4-b72b-37c3-be7f-a38f2a7f66a8'})
> WITH route
> MATCH (route)-[r:HAS_STOP]->(s)
> WITH route, r, collect(s) as stops
> DELETE r, route
> WITH filter(x in stops WHERE NOT x--()) as orphans
> UNWIND orphans as orphan
> DELETE orphan;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 2
Relationships deleted: 157
EmptyResult
|
+UpdateGraph(0)
|
+UNWIND
|
+ColumnFilter(0)
|
+Eager
|
+Extract
|
+UpdateGraph(1)
|
+ColumnFilter(1)
|
+EagerAggregation
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+--------------+------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+--------------+------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph(0) | 1 | 1 | | DeleteEntity |
| UNWIND | 1 | 0 | | |
| ColumnFilter(0) | 157 | 0 | | keep columns orphans |
| Eager | 157 | 0 | | |
| Extract | 157 | 0 | | orphans |
| UpdateGraph(1) | 157 | 158 | | DeleteEntity; DeleteEntity |
| ColumnFilter(1) | 157 | 0 | | keep columns route, r, stops |
| EagerAggregation | 157 | 0 | | route, r |
| SimplePatternMatcher | 157 | 0 | route, s, r | |
| SchemaIndex | 1 | 2 | route, route | { AUTOSTRING0}; :Route(id) |
+----------------------+------+--------+--------------+------------------------------+
Total database accesses: 161
With your query :
I slightly modified your query to make use of schema indexes
And this is the Execution plan with your query, the difference in db accesses is quite high
PROFILE MATCH (r1:Route { id: '1c565ac4-b72b-37c3-be7f-a38f2a7f66a8' })
> WITH r1
> MATCH (r1)-[rel1:HAS_STOP]->(s:Stop)
> MATCH (r2:Route)-[rel2:HAS_STOP]->(s)
> WITH s, COUNT(rel2) as c
> WHERE c = 1
> MATCH s-[rel2]-()
> DELETE s, rel2;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
Relationships deleted: 1
EmptyResult
|
+UpdateGraph
|
+Eager
|
+SimplePatternMatcher(0)
|
+Filter(0)
|
+ColumnFilter
|
+EagerAggregation
|
+Filter(1)
|
+SimplePatternMatcher(1)
|
+Filter(2)
|
+SimplePatternMatcher(2)
|
+SchemaIndex
+-------------------------+------+--------+-----------------------+-----------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+-------------------------+------+--------+-----------------------+-----------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 1 | 2 | | DeleteEntity; DeleteEntity |
| Eager | 1 | 0 | | |
| SimplePatternMatcher(0) | 1 | 0 | UNNAMED200, s, rel2 | |
| Filter(0) | 1 | 0 | | c == { AUTOINT1} |
| ColumnFilter | 157 | 0 | | keep columns s, c |
| EagerAggregation | 157 | 0 | | s |
| Filter(1) | 4797 | 4797 | | hasLabel(r2:Route(4)) |
| SimplePatternMatcher(1) | 4797 | 4797 | r2, s, rel2 | |
| Filter(2) | 157 | 157 | | hasLabel(s:Stop(3)) |
| SimplePatternMatcher(2) | 157 | 157 | r1, s, rel1 | |
| SchemaIndex | 1 | 2 | r1, r1 | { AUTOSTRING0}; :Route(id) |
+-------------------------+------+--------+-----------------------+-----------------------------+
Total database accesses: 9912

Resources