neo4j aggregate function by distance

neo4j aggregate function by distance - neo4j

I want to have some aggregated statistics by distance from root. For example,
(A)-[value:20]->(B)-[value:40]->(C)
(A)-[value:0]->(D)-[value:20]->(E)
CREATE (:firm {name:'A'}), (:firm {name:'B'}), (:firm {name:'C'}), (:firm {name:'D'}), (:firm {name:'E'});
MATCH (a:firm {name:'A'}), (b:firm {name:'B'}), (c:firm {name:'C'}), (d:firm {name:'D'}), (e:firm {name:'E'})
CREATE (a)-[:REL {value: 20}]->(b)->[:REL {value: 40}]->(c),
(a)-[:REL {value: 0}]->(d)->[:REL {value: 20}]->(e);
I want to get the average value of A's immediate neighbors and that of the 2nd layer neighbors, i.e.,
+-------------------+
| distance | avg |
+-------------------+
| 1 | 10 |
| 2 | 30 |
+-------------------+
How should I do it? I have tried the following
MATCH p=(n:NODE {name:'A'})-[r:REL*1..2]->(n:NODE)
RETURN length(p), sum(r:value);
But I am not sure how to operate on the variable-length path r.
Similarly, is it possible to get the cumulative value? i.e.,
+-------------------+
| name | cum |
+-------------------+
| B | 20 |
| C | 60 |
| D | 0 |
| E | 20 |
+-------------------+

The query below solves the first problem. Please note that it also solves the case where paths are not of equal length. I added (E)-[REL {value:99}]->(F)
MATCH path=(:firm {name:'A'})-[:REL*]->(leaf:firm)
WHERE NOT (leaf)-[:REL]->(:firm)
WITH COLLECT(path) AS paths, max(length(path)) AS longest
UNWIND RANGE(1,longest) AS depth
WITH depth,
REDUCE(sum=0, path IN [p IN paths WHERE length(p) >= depth] |
sum
+ relationships(path)[depth-1].value
) AS sumAtDepth,
SIZE([p IN paths WHERE length(p) >= depth]) AS countAtDepth
RETURN depth, sumAtDepth, countAtDepth, sumAtDepth/countAtDepth AS avgAtDepth
returning
╒═══════╤════════════╤══════════════╤════════════╕
│"depth"│"sumAtDepth"│"countAtDepth"│"avgAtDepth"│
╞═══════╪════════════╪══════════════╪════════════╡
│1 │20 │2 │10 │
├───────┼────────────┼──────────────┼────────────┤
│2 │60 │2 │30 │
├───────┼────────────┼──────────────┼────────────┤
│3 │99 │1 │99 │
└───────┴────────────┴──────────────┴────────────┘
The second question can be answered as follows:
MATCH (root:firm {name:'A'})
MATCH (descendant:firm) WHERE EXISTS((root)-[:REL*]->(descendant))
WITH root,descendant
WITH descendant,
REDUCE(sum=0,rel IN relationships([(descendant)<-[:REL*]-(root)][0][0]) |
sum + rel.value
) AS cumulative
RETURN descendant.name,cumulative ORDER BY descendant.name
returning
╒═════════════════╤════════════╕
│"descendant.name"│"cumulative"│
╞═════════════════╪════════════╡
│"B" │20 │
├─────────────────┼────────────┤
│"C" │60 │
├─────────────────┼────────────┤
│"D" │0 │
├─────────────────┼────────────┤
│"E" │20 │
├─────────────────┼────────────┤
│"F" │119 │
└─────────────────┴────────────┘

may I suggest your try it with a reduce function, you can retro fit it your code
// Match something name or distance..
MATCH
// If you have a condition put in here
// WHERE A<>B AND n.name = m.name
// WITH filterItems, collect(m) AS myItems
// Reduce will help sum/aggregate entire you are looking for
RETURN reduce( sum=0, x IN myItems | sum+x.cost )
LIMIT 10;

Related

extract decorating nodes if it exists but still return path if decorating nodes does not exist

I have the following graph
(y1:Y)
^
|
(a1:A) -> (b1:B) -> (c1:C)
(e1:E)
^
|
(d1:D)
^
|
(a2:A) -> (b2:B) -> (c2:C)
(a3:A) -> (b3:B) -> (c3:C)
I would like to find path between node label A and C. I can use the query
match p=((:A)-[*]->(:C))
return p
But I also want to get node label Y and node label D, E if these decorating nodes exists. If I try:
match p=((:A)-[*]->(cc:C)), (cc)-->(yy:Y), (cc)-[*]->(dd:D)-[*]->(ee:E)
return p, yy, dd, ee
Then it is only going to return the path if the C node has Y, D, E connects to it.
The output that I need is:
a1->b1->c1, y1, null
a2->b2->c2, null, [[d1, e1]]
a3->b3->c3, null, null
I.e., if decorating node does not exist, then just return null. For the array, it can be null or empty array. Also D and E nodes will be group into an array of arrays since there could be many pairs of D and E.
What is the best way to achieve this?

This should do it, returning an empty array for the deDecoration if there aren't any D-E decorations
MATCH p=((:A)-[*]->(c:C))
WITH p,
HEAD([(c)--(y:Y) | y ]) AS yDecoration,
[(c)-[*]->(d:D)-[*]->(e:E) | [d,e]] AS deDecoration
RETURN p, yDecoration, deDecoration
with this graph (multiple D-E)
this query
MATCH p=((:A)-[*]->(c:C))
WITH REDUCE(s='' , node IN nodes(p) | s + CASE WHEN s='' THEN '' ELSE '->' END + node.name) AS p,
HEAD([(c)--(y:Y) | y.name ]) AS yDecoration,
[(c)-[*]->(d:D)-[*]->(e:E) | [d.name,e.name]] AS deDecoration
RETURN p, yDecoration, deDecoration
returns
╒════════════╤═════════════╤═════════════════════════╕
│"p" │"yDecoration"│"deDecoration" │
╞════════════╪═════════════╪═════════════════════════╡
│"A2->B2->C2"│null │[] │
├────────────┼─────────────┼─────────────────────────┤
│"A1->B1->C1"│null │[["D2","E2"],["D1","E1"]]│
├────────────┼─────────────┼─────────────────────────┤
│"A3->B3->C3"│"Y1" │[] │
└────────────┴─────────────┴─────────────────────────┘

neo4j: type counts by distance

I want to have some count statistics by type by distance from root. For example,
(A type:'private')-[value:20]->(B type:'private')-[value:40]->(C type:'private')
(A type:'private')-[value:0]->(D type:'public')-[value:20]->(E type:'private')
CREATE (:firm {name:'A', type:'private'}), (:firm {name:'B', type:'private'}), (:firm {name:'C', type:'private'}), (:firm {name:'D', type:'public'}), (:firm {name:'E', type:'private'});
MATCH (a:firm {name:'A'}), (b:firm {name:'B'}), (c:firm {name:'C'}), (d:firm {name:'D'}), (e:firm {name:'E'})
CREATE (a)-[:REL {value: 20}]->(b)->[:REL {value: 40}]->(c),
(a)-[:REL {value: 0}]->(d)->[:REL {value: 20}]->(e);
I want to get the count of each type of A's immediate neighbors and that of the 2nd layer neighbors, i.e.,
+-----------------------------+
| distance | type | count |
+-----------------------------+
| 0 | private | 1 |
| 0 | public | 0 |
| 1 | private | 1 |
| 1 | public | 1 |
| 2 | private | 2 |
| 2 | public | 0 |
+-----------------------------+
Here is a related question about aggregate statistics by distance.
Thanks!

For this on, the apoc library comes in handy:
MATCH path=(:firm {name:'A'})-[:REL*]->(leaf:firm)
WHERE NOT (leaf)-[:REL]->(:firm)
WITH COLLECT(path) AS paths, max(length(path)) AS longest
UNWIND RANGE(0,longest) AS depth
WITH depth,
apoc.coll.frequencies([node IN apoc.coll.toSet(REDUCE(arr=[], path IN [p IN paths WHERE length(p) >= depth] |
arr
+ nodes(path)[depth]
)
) | node.type
]) as typesAtDepth
UNWIND typesAtDepth AS typeAtDepth
RETURN depth, typeAtDepth.item AS type, typeAtDepth.count AS count
for this dataset
CREATE (_174:`firm` { `name`: 'A', `type`: 'type2' }) CREATE (_200:`firm` { `name`: 'D', `type`: 'type2' }) CREATE (_202:`firm` { `name`: 'E', `type`: 'type2' }) CREATE (_203:`firm` { `name`: 'F', `type`: 'type1' }) CREATE (_191:`firm` { `name`: 'B', `type`: 'type1' }) CREATE (_193:`firm` { `name`: 'C', `type`: 'type2' }) CREATE (_174)-[:`REL` { `value`: '0' }]->(_200) CREATE (_200)-[:`REL` { `value`: '20' }]->(_202) CREATE (_202)-[:`REL` { `value`: '99' }]->(_203) CREATE (_174)-[:`REL` { `value`: '20' }]->(_191) CREATE (_191)-[:`REL` { `value`: '40' }]->(_193)
it returns this result:
╒═══════╤═══════╤═══════╕
│"depth"│"type" │"count"│
╞═══════╪═══════╪═══════╡
│0 │"type2"│1 │
├───────┼───────┼───────┤
│1 │"type2"│1 │
├───────┼───────┼───────┤
│1 │"type1"│1 │
├───────┼───────┼───────┤
│2 │"type2"│2 │
├───────┼───────┼───────┤
│3 │"type1"│1 │
└───────┴───────┴───────┘

How to match node labels using OR?

match (p:Product {id:'5116003'})-[r]->(o:Attributes|ExtraAttribute) return p, o
How to match two possible node labels in such a query?
Per cybersam's suggestion, I changed to the follwoing:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE o:Attributes OR o:ExtraAttributes
**WHERE any(key in keys(o) WHERE toLower(key) contains 'weight')**
return o
Now I need to add the 2nd 'where' clause. How to modify that?

You can try using any() function:
match (p:Product {id:'5116003'})-[r]->(o)
where any (label in labels(o) where label in ['Attributes', 'ExtraAttribute'])
return p, o
Also, if you have APOC procedures, you can use apoc.path.expand path expander procedure that expands from start node following the given relationships from min to max-level adhering to the label filters.
match (p:Product {id:'5116003'})
call apoc.path.expand(p, null,"+Attributes|ExtraAttribute",0,1) yield path
with nodes(path) as nodes
// return p and o nodes
return nodes[0], nodes[1]
See more here.

These two single-label forms of your query:
MATCH (p:Product {id:'5116003'})-->(o:Attributes) RETURN p, o;
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes RETURN p, o;
produce the same execution plan, as follows (I assume that there is an index on :Product(id)):
+-----------------+----------------+------+---------+------------------+--------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+--------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+--------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | o:Attributes |
| | +----------------+------+---------+------------------+--------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+--------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+--------------+
This two-label form of the second query above:
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes OR o: ExtraAttribute RETURN p, o;
produces an execution plan that is very similar (and therefore probably not much more expensive):
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | Ors(o:Attributes, o:ExtraAttribute) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
By the way, the first query in the answer by #BrunoPeres has a similar execution plan as well, but the Filter operation is very different. It is not clear which would be faster.
[UPDATE]
To answer your updated question: since you cannot have 2 back-to-back WHERE clauses, you can just add more terms to the already existing WHERE clause, like so:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE
(o:Attributes OR o:ExtraAttributes) AND
ANY(key in KEYS(o) WHERE TOLOWER(key) CONTAINS 'weight')
RETURN o;

Make a path where next node is not the previous node?

I have ~1.5 M nodes in a graph, that are structured like this (picture)
I run a Cypher query that performs calculations on each relationship traversed:
WITH 1 AS startVal
MATCH x = (c:Currency)-[r:Arb*2]->(m)
WITH x, REDUCE(s = startVal, e IN r | s * e.rate) AS endVal, startVal
RETURN EXTRACT(n IN NODES(x) | n) as Exchanges,
extract ( e IN relationships(x) | startVal * e.rate) AS Rel,
endVal, endVal - startVal AS Profit
ORDER BY Profit DESC LIMIT 5
The problem is it returns the path ("One")->("hop")->("One"), which is useless for me.
How can I make it not choose the previously walked node as the next node (i.e. "One"->"hop"->"any_other_node_but_not_"one")?
I have read that NODE_RECENT should address my issue. However, there was no example on how to specify the length of recent nodes in RestAPI or APOC procedures.
Is there a Cypher query for my case?
Thank you.
P.S. I am extremely new (less than 2 month) to Neo4j and coding. So my apologies if there is an obvious simple solution.

I don't know if I understood your question completely, but I believe that you problem can be solved putting a WHERE clause on the MATCH to prevent the not desired relationship be matched, like this:
WITH 1 AS startVal
MATCH x = (c:Currency)-[r:Arb*2]->(m)
WHERE NOT (m)-[:Arb]->(c)
WITH x, REDUCE(s = startVal, e IN r | s * e.rate) AS endVal, startVal
RETURN EXTRACT(n IN NODES(x) | n) as Exchanges,
extract ( e IN relationships(x) | startVal * e.rate) AS Rel,
endVal, endVal - startVal AS Profit
ORDER BY Profit DESC LIMIT 5

Try inserting this clause after your MATCH clause, to filter out cases where c and m are the same:
WHERE c <> m
[EDITED]
That is:
WITH 1 AS startVal
MATCH x = (c:Currency)-[r:Arb*2]->(m)
WHERE c <> m
WITH x, REDUCE(s = startVal, e IN r | s * e.rate) AS endVal, startVal
RETURN EXTRACT(n IN NODES(x) | n) as Exchanges,
extract ( e IN relationships(x) | startVal * e.rate) AS Rel,
endVal, endVal - startVal AS Profit
ORDER BY Profit DESC LIMIT 5;
After using this query to create test data:
CREATE
(c:Currency {name: 'One'})-[:Arb {rate:1}]->(h:Account {name: 'hop'})-[:Arb {rate:2}]->(t:Currency {name: 'Two'}),
(t)-[:Arb {rate:3}]->(h)-[:Arb {rate:4}]->(c)
the above query produces these results:
+-----------------------------------------------------------------------------------------+
| Exchanges | Rel | endVal | Profit |
+-----------------------------------------------------------------------------------------+
| [Node[8]{name:"Two"},Node[7]{name:"hop"},Node[6]{name:"One"}] | [3,4] | 12 | 11 |
| [Node[6]{name:"One"},Node[7]{name:"hop"},Node[8]{name:"Two"}] | [1,2] | 2 | 1 |
+-----------------------------------------------------------------------------------------+

How to get the index of FOREACH iterations

Within a FOREACH statement [e.g. day in range(dayX, dayY)] is there an easy way to find out the index of the iteration ?

Yes, you can.
Here is an example query that creates 8 Day nodes that contain an index and day:
WITH 5 AS day1, 12 AS day2
FOREACH (i IN RANGE(0, day2-day1) |
CREATE (:Day { index: i, day: day1+i }));
This query prints out the resulting nodes:
MATCH (d:Day)
RETURN d
ORDER BY d.index;
and here is an example result:
+--------------------------+
| d |
+--------------------------+
| Node[54]{day:5,index:0} |
| Node[55]{day:6,index:1} |
| Node[56]{day:7,index:2} |
| Node[57]{day:8,index:3} |
| Node[58]{day:9,index:4} |
| Node[59]{day:10,index:5} |
| Node[60]{day:11,index:6} |
| Node[61]{day:12,index:7} |
+--------------------------+

FOREACH does not yield the index during iteration. If you want the index you can use a combination of range and UNWIND like this:
WITH ["some", "array", "of", "things"] AS things
UNWIND range(0,size(things)-2) AS i
// Do something for each element in the array. In this case connect two Things
MERGE (t1:Thing {name:things[i]})-[:RELATED_TO]->(t2:Thing {name:things[i+1]})
This example iterates a counter i over which you can use to access the item at index i in the array.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

neo4j aggregate function by distance - neo4j

Related

extract decorating nodes if it exists but still return path if decorating nodes does not exist

neo4j: type counts by distance

How to match node labels using OR?

Make a path where next node is not the previous node?

How to get the index of FOREACH iterations

Categories

Resources