I want to have some aggregated statistics by distance from root. For example,
(A)-[value:20]->(B)-[value:40]->(C)
(A)-[value:0]->(D)-[value:20]->(E)
CREATE (:firm {name:'A'}), (:firm {name:'B'}), (:firm {name:'C'}), (:firm {name:'D'}), (:firm {name:'E'});
MATCH (a:firm {name:'A'}), (b:firm {name:'B'}), (c:firm {name:'C'}), (d:firm {name:'D'}), (e:firm {name:'E'})
CREATE (a)-[:REL {value: 20}]->(b)->[:REL {value: 40}]->(c),
(a)-[:REL {value: 0}]->(d)->[:REL {value: 20}]->(e);
I want to get the average value of A's immediate neighbors and that of the 2nd layer neighbors, i.e.,
+-------------------+
| distance | avg |
+-------------------+
| 1 | 10 |
| 2 | 30 |
+-------------------+
How should I do it? I have tried the following
MATCH p=(n:NODE {name:'A'})-[r:REL*1..2]->(n:NODE)
RETURN length(p), sum(r:value);
But I am not sure how to operate on the variable-length path r.
Similarly, is it possible to get the cumulative value? i.e.,
+-------------------+
| name | cum |
+-------------------+
| B | 20 |
| C | 60 |
| D | 0 |
| E | 20 |
+-------------------+
The query below solves the first problem. Please note that it also solves the case where paths are not of equal length. I added (E)-[REL {value:99}]->(F)
MATCH path=(:firm {name:'A'})-[:REL*]->(leaf:firm)
WHERE NOT (leaf)-[:REL]->(:firm)
WITH COLLECT(path) AS paths, max(length(path)) AS longest
UNWIND RANGE(1,longest) AS depth
WITH depth,
REDUCE(sum=0, path IN [p IN paths WHERE length(p) >= depth] |
sum
+ relationships(path)[depth-1].value
) AS sumAtDepth,
SIZE([p IN paths WHERE length(p) >= depth]) AS countAtDepth
RETURN depth, sumAtDepth, countAtDepth, sumAtDepth/countAtDepth AS avgAtDepth
returning
╒═══════╤════════════╤══════════════╤════════════╕
│"depth"│"sumAtDepth"│"countAtDepth"│"avgAtDepth"│
╞═══════╪════════════╪══════════════╪════════════╡
│1 │20 │2 │10 │
├───────┼────────────┼──────────────┼────────────┤
│2 │60 │2 │30 │
├───────┼────────────┼──────────────┼────────────┤
│3 │99 │1 │99 │
└───────┴────────────┴──────────────┴────────────┘
The second question can be answered as follows:
MATCH (root:firm {name:'A'})
MATCH (descendant:firm) WHERE EXISTS((root)-[:REL*]->(descendant))
WITH root,descendant
WITH descendant,
REDUCE(sum=0,rel IN relationships([(descendant)<-[:REL*]-(root)][0][0]) |
sum + rel.value
) AS cumulative
RETURN descendant.name,cumulative ORDER BY descendant.name
returning
╒═════════════════╤════════════╕
│"descendant.name"│"cumulative"│
╞═════════════════╪════════════╡
│"B" │20 │
├─────────────────┼────────────┤
│"C" │60 │
├─────────────────┼────────────┤
│"D" │0 │
├─────────────────┼────────────┤
│"E" │20 │
├─────────────────┼────────────┤
│"F" │119 │
└─────────────────┴────────────┘
may I suggest your try it with a reduce function, you can retro fit it your code
// Match something name or distance..
MATCH
// If you have a condition put in here
// WHERE A<>B AND n.name = m.name
// WITH filterItems, collect(m) AS myItems
// Reduce will help sum/aggregate entire you are looking for
RETURN reduce( sum=0, x IN myItems | sum+x.cost )
LIMIT 10;
I have this Cypher query:
MATCH (Parent)-[R]-(Child) WHERE ID(Parent)=$parentId
CALL {
WITH Child
RETURN apoc.node.degree(Child) as ChildDegree
}
WITH Parent, Child, R, ChildDegree
RETURN Parent, Child, type(R), ChildDegree
ORDER BY R
LIMIT 35
Which returns limited data (limit is 35). This limit is something which bothers me. Imagine that Parent has this Children:
40 x A
3 x B
2 x C
In this situation my query sometimes returns (35 x A). What I'd like to achieve is to make this query order by the rarest type of child for this parent and for this example return this data:
2 x C
3 x B
30 x A
I tested below query using the Movie database.
Collect parent, child, R and child degree and put all child degree in a list (collect_nodes)
Create a range of index to accumulate the sum of child degrees (range_idx)
From 0 to the number of rows, get a running sum of degrees
From each parent, child, R and child degree, check if sum_degree <= 35
Return the parent, child, R and child degree
You cannot get the exact rows that equals 35 because what you limit is the number of rows and not the child degrees. Also, show us sample data to work on so that we can give you the best answer
MATCH (Parent)-[R]-(Child) WHERE ID(Parent)=$parentId
CALL {
WITH Child
RETURN apoc.node.degree(Child) as ChildDegree
}
WITH Parent, Child, type(R) as R, ChildDegree ORDER BY R, ChildDegree
WITH collect({p:Parent, c:Child, r: R, cd:ChildDegree }) as collect_nodes, collect(ChildDegree) as collect_degs
WITH collect_nodes, collect_degs, RANGE(0, SIZE(collect_degs)-1) AS range_idx
UNWIND range_idx as idx
WITH collect_nodes[idx] as nodes, REDUCE(acc = 0, value in (collect_degs[idx] + collect_degs[..idx]) | acc + value) AS sum_degree
UNWIND nodes as n_set
WITH n_set.p as Parent, n_set.c as Child, n_set.r as R, n_set.cd as ChildDegree WHERE sum_degree <= 35
RETURN Parent, Child, R, ChildDegree
Sample result:
╒═══════════════════════════╤══════════════════════════════════════════════════════════════════════╤══════════╤═════════════╕
│"Parent" │"Child" │"R" │"ChildDegree"│
╞═══════════════════════════╪══════════════════════════════════════════════════════════════════════╪══════════╪═════════════╡
│{"name":"Jessica Thompson"}│{"name":"Paul Blythe"} │"FOLLOWS" │2 │
├───────────────────────────┼──────────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│{"name":"Jessica Thompson"}│{"name":"James Thompson"} │"FOLLOWS" │3 │
├───────────────────────────┼──────────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│{"name":"Jessica Thompson"}│{"name":"Angela Scope"} │"FOLLOWS" │3 │
├───────────────────────────┼──────────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│{"name":"Jessica Thompson"}│{"tagline":"Come as you are","title":"The Birdcage","released":1996} │"REVIEWED"│5 │
├───────────────────────────┼──────────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│{"name":"Jessica Thompson"}│{"tagline":"It's a hell of a thing, killing a man","title":"Unforgiven│"REVIEWED"│5 │
│ │","released":1992} │ │ │
├───────────────────────────┼──────────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│{"name":"Jessica Thompson"}│{"tagline":"Break The Codes","title":"The Da Vinci Code","released":20│"REVIEWED"│7 │
│ │06} │ │ │
├───────────────────────────┼──────────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│{"name":"Jessica Thompson"}│{"tagline":"Pain heals, Chicks dig scars... Glory lasts forever","titl│"REVIEWED"│8 │
│ │e":"The Replacements","released":2000} │ │ │
└───────────────────────────┴──────────────────────────────────────────────────────────────────────┴──────────┴─────────────┘
I am trying to find a way to group combinations together.
Say we have nodes of type person, hobby, place, city. Say the graph has the following relations (merged)
CREATE
(Joe:Person {name: 'Joe'}),
(hike:Hobby {name: 'hike'}),
(eat:Hobby {name: 'eat'}),
(drink:Hobby {name: 'drink'}),
(Mountain:Place {name: 'Mountain'}),
(Lake:Place {name: 'Lake'}),
(DavesBarGrill:Place {name: 'Daves BarGrill'}),
(Diner:Place {name: 'Diner'}),
(Lounge:Place {name: 'Lounge'}),
(DiveBar:Place {name: 'Dive Bar'}),
(Joe)-[:likes]->(hike),
(Joe)-[:likes]->(eat),
(Joe)-[:likes]->(drink),
(hike)-[:canDoAt]->(Mountain),
(hike)-[:canDoAt]->(Lake),
(eat)-[:canDoAt]->(DavesBarGrill),
(eat)-[:canDoAt]->(Diner),
(drink)-[:canDoAt]->(Lounge),
(drink)-[:canDoAt]->(DiveBar)
For a day planned to do each of his hobbies once, there are 8 combinations of places to hike and eat and drink. I want to be able to capture this in a query.
The naive approach,
MATCH (p:Person)-[:likes]->(h:Hobby)-[:canDoAt]->(pl:Place)
RETURN p, h, pl
will at best be able to group by person and hobby, which will cause rows of the same hobby to be grouped together. what i want is to somehow group by combos, i.e.:
//Joe Combo 1// Joe,hike,Mountain
Joe,eat,Daves
Joe,drink,Lounge
//Joe Combo 2// Joe,hike,Lake
Joe,eat,Daves
Joe,drink,Lounge
Is there a way to somehow assign a number to all path matches and then use that assignment to sort?
That's a very good question! I don't have the whole solution yet, but some thoughts: as Martin Preusse said, we are looking to generate a Cartesian product.
This is difficult, but you can workaround it by a lot of hacking, including using a double-reduce:
WITH [['a', 'b'], [1, 2, 3], [true, false]] AS hs
WITH hs, size(hs) AS numberOfHobbys, reduce(acc = 1, h in hs | acc * size(h)) AS numberOfCombinations, extract(h IN hs | length(h)) AS hLengths
WITH hs, hLengths, numberOfHobbys, range(0, numberOfCombinations-1) AS combinationIndexes
UNWIND combinationIndexes AS combinationIndex
WITH
combinationIndex,
reduce(acc = [], i in range(0, numberOfHobbys-1) |
acc + toInt(combinationIndex/(reduce(acc2 = 1, j in range(0, i-1) | acc2 * hLengths[j]))) % hLengths[i]
) AS indices,
reduce(acc = [], i in range(0, numberOfHobbys-1) |
acc + reduce(acc2 = 1, j in range(0, i-1) | acc2 * hLengths[j])
) AS multipliers,
reduce(acc = [], i in range(0, numberOfHobbys-1) |
acc + hs[i][
toInt(combinationIndex/(reduce(acc2 = 1, j in range(0, i-1) | acc2 * hLengths[j]))) % hLengths[i]
]
) AS combinations
RETURN combinationIndex, indices, multipliers, combinations
The idea is the following: we multiply the number of potential values, e.g. for ['a', 'b'], [1, 2, 3], [true, false], we calculate n = 2×3×2 = 12, using the first reduce in the query. We then iterate from 0 to n-1, and assign a row for each number, using the formula a×1 + b×2 + c×6, where a, b, c index the respective values, so all are non-negative integers and a < 2, b < 3 and c < 2.
0×1 + 0×2 + 0×6 = 0
1×1 + 0×2 + 0×6 = 1
0×1 + 1×2 + 0×6 = 2
1×1 + 1×2 + 0×6 = 3
0×1 + 2×2 + 0×6 = 4
1×1 + 2×2 + 0×6 = 5
0×1 + 0×2 + 1×6 = 6
1×1 + 0×2 + 1×6 = 7
0×1 + 1×2 + 1×6 = 8
1×1 + 1×2 + 1×6 = 9
0×1 + 2×2 + 1×6 = 10
1×1 + 2×2 + 1×6 = 11
The result is:
╒════════════════╤═════════╤═══════════╤═════════════╕
│combinationIndex│indices │multipliers│combinations │
╞════════════════╪═════════╪═══════════╪═════════════╡
│0 │[0, 0, 0]│[1, 2, 6] │[a, 1, true] │
├────────────────┼─────────┼───────────┼─────────────┤
│1 │[1, 0, 0]│[1, 2, 6] │[b, 1, true] │
├────────────────┼─────────┼───────────┼─────────────┤
│2 │[0, 1, 0]│[1, 2, 6] │[a, 2, true] │
├────────────────┼─────────┼───────────┼─────────────┤
│3 │[1, 1, 0]│[1, 2, 6] │[b, 2, true] │
├────────────────┼─────────┼───────────┼─────────────┤
│4 │[0, 2, 0]│[1, 2, 6] │[a, 3, true] │
├────────────────┼─────────┼───────────┼─────────────┤
│5 │[1, 2, 0]│[1, 2, 6] │[b, 3, true] │
├────────────────┼─────────┼───────────┼─────────────┤
│6 │[0, 0, 1]│[1, 2, 6] │[a, 1, false]│
├────────────────┼─────────┼───────────┼─────────────┤
│7 │[1, 0, 1]│[1, 2, 6] │[b, 1, false]│
├────────────────┼─────────┼───────────┼─────────────┤
│8 │[0, 1, 1]│[1, 2, 6] │[a, 2, false]│
├────────────────┼─────────┼───────────┼─────────────┤
│9 │[1, 1, 1]│[1, 2, 6] │[b, 2, false]│
├────────────────┼─────────┼───────────┼─────────────┤
│10 │[0, 2, 1]│[1, 2, 6] │[a, 3, false]│
├────────────────┼─────────┼───────────┼─────────────┤
│11 │[1, 2, 1]│[1, 2, 6] │[b, 3, false]│
└────────────────┴─────────┴───────────┴─────────────┘
So, for your problem, the query might look like this:
MATCH (p:Person)-[:likes]->(h:Hobby)-[:canDoAt]->(pl:Place)
WITH p, h, collect(pl.name) AS places
WITH p, collect(places) AS hs
WITH hs, size(hs) AS numberOfHobbys, reduce(acc = 1, h in hs | acc * size(h)) AS numberOfCombinations, extract(h IN hs | length(h)) AS hLengths
WITH hs, hLengths, numberOfHobbys, range(0, numberOfCombinations-1) AS combinationIndexes
UNWIND combinationIndexes AS combinationIndex
WITH
reduce(acc = [], i in range(0, numberOfHobbys-1) |
acc + hs[i][
toInt(combinationIndex/(reduce(acc2 = 1, j in range(0, i-1) | acc2 * hLengths[j]))) % hLengths[i]
]
) AS combinations
RETURN combinations
This looks like this:
╒════════════════════════════════════╕
│combinations │
╞════════════════════════════════════╡
│[Diner, Lounge, Lake] │
├────────────────────────────────────┤
│[Daves BarGrill, Lounge, Lake] │
├────────────────────────────────────┤
│[Diner, Dive Bar, Lake] │
├────────────────────────────────────┤
│[Daves BarGrill, Dive Bar, Lake] │
├────────────────────────────────────┤
│[Diner, Lounge, Mountain] │
├────────────────────────────────────┤
│[Daves BarGrill, Lounge, Mountain] │
├────────────────────────────────────┤
│[Diner, Dive Bar, Mountain] │
├────────────────────────────────────┤
│[Daves BarGrill, Dive Bar, Mountain]│
└────────────────────────────────────┘
Obviously, we would also like to get the person and the names of his/her hobbies:
MATCH (p:Person)-[:likes]->(h:Hobby)-[:canDoAt]->(pl:Place)
WITH p, h, collect([h.name, pl.name]) AS places
WITH p, collect(places) AS hs
WITH p, hs, size(hs) AS numberOfHobbys, reduce(acc = 1, h in hs | acc * size(h)) AS numberOfCombinations, extract(h IN hs | length(h)) AS hLengths
WITH p, hs, hLengths, numberOfHobbys, range(0, numberOfCombinations-1) AS combinationIndexes
UNWIND combinationIndexes AS combinationIndex
WITH
p, reduce(acc = [], i in range(0, numberOfHobbys-1) |
acc + [hs[i][
toInt(combinationIndex/(reduce(acc2 = 1, j in range(0, i-1) | acc2 * hLengths[j]))) % hLengths[i]
]]
) AS combinations
RETURN p, combinations
The results:
╒═══════════╤════════════════════════════════════════════════════════════╕
│p │combinations │
╞═══════════╪════════════════════════════════════════════════════════════╡
│{name: Joe}│[[eat, Diner], [drink, Lounge], [hike, Lake]] │
├───────────┼────────────────────────────────────────────────────────────┤
│{name: Joe}│[[eat, Daves BarGrill], [drink, Lounge], [hike, Lake]] │
├───────────┼────────────────────────────────────────────────────────────┤
│{name: Joe}│[[eat, Diner], [drink, Dive Bar], [hike, Lake]] │
├───────────┼────────────────────────────────────────────────────────────┤
│{name: Joe}│[[eat, Daves BarGrill], [drink, Dive Bar], [hike, Lake]] │
├───────────┼────────────────────────────────────────────────────────────┤
│{name: Joe}│[[eat, Diner], [drink, Lounge], [hike, Mountain]] │
├───────────┼────────────────────────────────────────────────────────────┤
│{name: Joe}│[[eat, Daves BarGrill], [drink, Lounge], [hike, Mountain]] │
├───────────┼────────────────────────────────────────────────────────────┤
│{name: Joe}│[[eat, Diner], [drink, Dive Bar], [hike, Mountain]] │
├───────────┼────────────────────────────────────────────────────────────┤
│{name: Joe}│[[eat, Daves BarGrill], [drink, Dive Bar], [hike, Mountain]]│
└───────────┴────────────────────────────────────────────────────────────┘
I might be overthinking this, so any comments are welcome.
An important remark: the fact that this is so complicated with pure Cypher is probably a good sign that you're better off calculating this from the client application.
I'm pretty sure you cannot do this in cypher. What you are looking for is the Cartesian product of all places grouped by person and hobby.
A: [ [Joe, hike, Mountain], [Joe, hike, Lake] ]
B: [ [Joe, eat, Daves], [Joe, eat, Diner] ]
C: [ [Joe, drink, Lounge], [Joe, drink, Bar] ]
And you are looking for A x B x C.
As far as I know you can't group the return in Cypher like this. You should return all person, hobby, place rows and do this in a Python script where you build the grouped sets and calculate the Cartesian product.
The problem is that you get a lot of combinations with growing numbers of hobbies and places.