neo4j: type counts by distance - neo4j

I want to have some count statistics by type by distance from root. For example,
(A type:'private')-[value:20]->(B type:'private')-[value:40]->(C type:'private')
(A type:'private')-[value:0]->(D type:'public')-[value:20]->(E type:'private')
CREATE (:firm {name:'A', type:'private'}), (:firm {name:'B', type:'private'}), (:firm {name:'C', type:'private'}), (:firm {name:'D', type:'public'}), (:firm {name:'E', type:'private'});
MATCH (a:firm {name:'A'}), (b:firm {name:'B'}), (c:firm {name:'C'}), (d:firm {name:'D'}), (e:firm {name:'E'})
CREATE (a)-[:REL {value: 20}]->(b)->[:REL {value: 40}]->(c),
(a)-[:REL {value: 0}]->(d)->[:REL {value: 20}]->(e);
I want to get the count of each type of A's immediate neighbors and that of the 2nd layer neighbors, i.e.,
+-----------------------------+
| distance | type | count |
+-----------------------------+
| 0 | private | 1 |
| 0 | public | 0 |
| 1 | private | 1 |
| 1 | public | 1 |
| 2 | private | 2 |
| 2 | public | 0 |
+-----------------------------+
Here is a related question about aggregate statistics by distance.
Thanks!

For this on, the apoc library comes in handy:
MATCH path=(:firm {name:'A'})-[:REL*]->(leaf:firm)
WHERE NOT (leaf)-[:REL]->(:firm)
WITH COLLECT(path) AS paths, max(length(path)) AS longest
UNWIND RANGE(0,longest) AS depth
WITH depth,
apoc.coll.frequencies([node IN apoc.coll.toSet(REDUCE(arr=[], path IN [p IN paths WHERE length(p) >= depth] |
arr
+ nodes(path)[depth]
)
) | node.type
]) as typesAtDepth
UNWIND typesAtDepth AS typeAtDepth
RETURN depth, typeAtDepth.item AS type, typeAtDepth.count AS count
for this dataset
CREATE (_174:`firm` { `name`: 'A', `type`: 'type2' }) CREATE (_200:`firm` { `name`: 'D', `type`: 'type2' }) CREATE (_202:`firm` { `name`: 'E', `type`: 'type2' }) CREATE (_203:`firm` { `name`: 'F', `type`: 'type1' }) CREATE (_191:`firm` { `name`: 'B', `type`: 'type1' }) CREATE (_193:`firm` { `name`: 'C', `type`: 'type2' }) CREATE (_174)-[:`REL` { `value`: '0' }]->(_200) CREATE (_200)-[:`REL` { `value`: '20' }]->(_202) CREATE (_202)-[:`REL` { `value`: '99' }]->(_203) CREATE (_174)-[:`REL` { `value`: '20' }]->(_191) CREATE (_191)-[:`REL` { `value`: '40' }]->(_193)
it returns this result:
╒═══════╤═══════╤═══════╕
│"depth"│"type" │"count"│
╞═══════╪═══════╪═══════╡
│0 │"type2"│1 │
├───────┼───────┼───────┤
│1 │"type2"│1 │
├───────┼───────┼───────┤
│1 │"type1"│1 │
├───────┼───────┼───────┤
│2 │"type2"│2 │
├───────┼───────┼───────┤
│3 │"type1"│1 │
└───────┴───────┴───────┘

Related

extract decorating nodes if it exists but still return path if decorating nodes does not exist

I have the following graph
(y1:Y)
^
|
(a1:A) -> (b1:B) -> (c1:C)
(e1:E)
^
|
(d1:D)
^
|
(a2:A) -> (b2:B) -> (c2:C)
(a3:A) -> (b3:B) -> (c3:C)
I would like to find path between node label A and C. I can use the query
match p=((:A)-[*]->(:C))
return p
But I also want to get node label Y and node label D, E if these decorating nodes exists. If I try:
match p=((:A)-[*]->(cc:C)), (cc)-->(yy:Y), (cc)-[*]->(dd:D)-[*]->(ee:E)
return p, yy, dd, ee
Then it is only going to return the path if the C node has Y, D, E connects to it.
The output that I need is:
a1->b1->c1, y1, null
a2->b2->c2, null, [[d1, e1]]
a3->b3->c3, null, null
I.e., if decorating node does not exist, then just return null. For the array, it can be null or empty array. Also D and E nodes will be group into an array of arrays since there could be many pairs of D and E.
What is the best way to achieve this?
This should do it, returning an empty array for the deDecoration if there aren't any D-E decorations
MATCH p=((:A)-[*]->(c:C))
WITH p,
HEAD([(c)--(y:Y) | y ]) AS yDecoration,
[(c)-[*]->(d:D)-[*]->(e:E) | [d,e]] AS deDecoration
RETURN p, yDecoration, deDecoration
with this graph (multiple D-E)
this query
MATCH p=((:A)-[*]->(c:C))
WITH REDUCE(s='' , node IN nodes(p) | s + CASE WHEN s='' THEN '' ELSE '->' END + node.name) AS p,
HEAD([(c)--(y:Y) | y.name ]) AS yDecoration,
[(c)-[*]->(d:D)-[*]->(e:E) | [d.name,e.name]] AS deDecoration
RETURN p, yDecoration, deDecoration
returns
╒════════════╤═════════════╤═════════════════════════╕
│"p" │"yDecoration"│"deDecoration" │
╞════════════╪═════════════╪═════════════════════════╡
│"A2->B2->C2"│null │[] │
├────────────┼─────────────┼─────────────────────────┤
│"A1->B1->C1"│null │[["D2","E2"],["D1","E1"]]│
├────────────┼─────────────┼─────────────────────────┤
│"A3->B3->C3"│"Y1" │[] │
└────────────┴─────────────┴─────────────────────────┘

neo4j aggregate function by distance

I want to have some aggregated statistics by distance from root. For example,
(A)-[value:20]->(B)-[value:40]->(C)
(A)-[value:0]->(D)-[value:20]->(E)
CREATE (:firm {name:'A'}), (:firm {name:'B'}), (:firm {name:'C'}), (:firm {name:'D'}), (:firm {name:'E'});
MATCH (a:firm {name:'A'}), (b:firm {name:'B'}), (c:firm {name:'C'}), (d:firm {name:'D'}), (e:firm {name:'E'})
CREATE (a)-[:REL {value: 20}]->(b)->[:REL {value: 40}]->(c),
(a)-[:REL {value: 0}]->(d)->[:REL {value: 20}]->(e);
I want to get the average value of A's immediate neighbors and that of the 2nd layer neighbors, i.e.,
+-------------------+
| distance | avg |
+-------------------+
| 1 | 10 |
| 2 | 30 |
+-------------------+
How should I do it? I have tried the following
MATCH p=(n:NODE {name:'A'})-[r:REL*1..2]->(n:NODE)
RETURN length(p), sum(r:value);
But I am not sure how to operate on the variable-length path r.
Similarly, is it possible to get the cumulative value? i.e.,
+-------------------+
| name | cum |
+-------------------+
| B | 20 |
| C | 60 |
| D | 0 |
| E | 20 |
+-------------------+
The query below solves the first problem. Please note that it also solves the case where paths are not of equal length. I added (E)-[REL {value:99}]->(F)
MATCH path=(:firm {name:'A'})-[:REL*]->(leaf:firm)
WHERE NOT (leaf)-[:REL]->(:firm)
WITH COLLECT(path) AS paths, max(length(path)) AS longest
UNWIND RANGE(1,longest) AS depth
WITH depth,
REDUCE(sum=0, path IN [p IN paths WHERE length(p) >= depth] |
sum
+ relationships(path)[depth-1].value
) AS sumAtDepth,
SIZE([p IN paths WHERE length(p) >= depth]) AS countAtDepth
RETURN depth, sumAtDepth, countAtDepth, sumAtDepth/countAtDepth AS avgAtDepth
returning
╒═══════╤════════════╤══════════════╤════════════╕
│"depth"│"sumAtDepth"│"countAtDepth"│"avgAtDepth"│
╞═══════╪════════════╪══════════════╪════════════╡
│1 │20 │2 │10 │
├───────┼────────────┼──────────────┼────────────┤
│2 │60 │2 │30 │
├───────┼────────────┼──────────────┼────────────┤
│3 │99 │1 │99 │
└───────┴────────────┴──────────────┴────────────┘
The second question can be answered as follows:
MATCH (root:firm {name:'A'})
MATCH (descendant:firm) WHERE EXISTS((root)-[:REL*]->(descendant))
WITH root,descendant
WITH descendant,
REDUCE(sum=0,rel IN relationships([(descendant)<-[:REL*]-(root)][0][0]) |
sum + rel.value
) AS cumulative
RETURN descendant.name,cumulative ORDER BY descendant.name
returning
╒═════════════════╤════════════╕
│"descendant.name"│"cumulative"│
╞═════════════════╪════════════╡
│"B" │20 │
├─────────────────┼────────────┤
│"C" │60 │
├─────────────────┼────────────┤
│"D" │0 │
├─────────────────┼────────────┤
│"E" │20 │
├─────────────────┼────────────┤
│"F" │119 │
└─────────────────┴────────────┘
may I suggest your try it with a reduce function, you can retro fit it your code
// Match something name or distance..
MATCH
// If you have a condition put in here
// WHERE A<>B AND n.name = m.name
// WITH filterItems, collect(m) AS myItems
// Reduce will help sum/aggregate entire you are looking for
RETURN reduce( sum=0, x IN myItems | sum+x.cost )
LIMIT 10;

How to match node labels using OR?

match (p:Product {id:'5116003'})-[r]->(o:Attributes|ExtraAttribute) return p, o
How to match two possible node labels in such a query?
Per cybersam's suggestion, I changed to the follwoing:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE o:Attributes OR o:ExtraAttributes
**WHERE any(key in keys(o) WHERE toLower(key) contains 'weight')**
return o
Now I need to add the 2nd 'where' clause. How to modify that?
You can try using any() function:
match (p:Product {id:'5116003'})-[r]->(o)
where any (label in labels(o) where label in ['Attributes', 'ExtraAttribute'])
return p, o
Also, if you have APOC procedures, you can use apoc.path.expand path expander procedure that expands from start node following the given relationships from min to max-level adhering to the label filters.
match (p:Product {id:'5116003'})
call apoc.path.expand(p, null,"+Attributes|ExtraAttribute",0,1) yield path
with nodes(path) as nodes
// return p and o nodes
return nodes[0], nodes[1]
See more here.
These two single-label forms of your query:
MATCH (p:Product {id:'5116003'})-->(o:Attributes) RETURN p, o;
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes RETURN p, o;
produce the same execution plan, as follows (I assume that there is an index on :Product(id)):
+-----------------+----------------+------+---------+------------------+--------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+--------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+--------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | o:Attributes |
| | +----------------+------+---------+------------------+--------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+--------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+--------------+
This two-label form of the second query above:
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes OR o: ExtraAttribute RETURN p, o;
produces an execution plan that is very similar (and therefore probably not much more expensive):
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | Ors(o:Attributes, o:ExtraAttribute) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
By the way, the first query in the answer by #BrunoPeres has a similar execution plan as well, but the Filter operation is very different. It is not clear which would be faster.
[UPDATE]
To answer your updated question: since you cannot have 2 back-to-back WHERE clauses, you can just add more terms to the already existing WHERE clause, like so:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE
(o:Attributes OR o:ExtraAttributes) AND
ANY(key in KEYS(o) WHERE TOLOWER(key) CONTAINS 'weight')
RETURN o;

Aggregating over Distinct Nodes

The data model I have is: (Item)-[:HAS_AN]->(ItemType) and both node types have a field called ID. Items can be related to multiple ItemTypes and in some cases, these ItemTypes can have the same IDs. I'm trying to populate a structure {ID:..., SameType:...}, where SameType = 1 if a node has the same item type as some node (with ID = 1234), and 0 otherwise.
First, I get the candidate list of nodes qList and the source node's ItemType:
MATCH (p:Item{ID:1234})-[:HAS_AN]->(i)
WITH i as pItemType, qList
Then, I go through qList, comparing each node's ItemType to pItemType (which is the ItemType of the source node):
UNWIND qList as q
MATCH (q)-[:HAS_AN]->(i)
WITH q.ID as qID, pItemType, i,
CASE i
WHEN pItemType THEN 1
ELSE 0
END as SameType
RETURN DISTINCT i, qID, pItemType, SameType
The problem I have is when some nodes have two ItemTypes with the same ID. This gives results where some of the entries are duplicates:
{ | | { |
"ID": 18 | 35258417 | "ID": 71 | 0
} | | } |
{ | | { |
"ID": 18 | 35258417 | "ID": 71 | 0
} | | } |
while I'd like to only take one such row, if more than one exists.
Placing DISTINCT where I have in the last part of the query doesn't seem to work. What's the best way to filter out such duplicates?
Update:
Here is a sample data subset: http://console.neo4j.org/r/f74pdq
Here are the queries that I'm running
MATCH (q:Item) WHERE q.ID <> 1234 WITH COLLECT(DISTINCT(q)) as qList
MATCH (p:Item{ID:1234})-[:HAS_AN]->(i:ItemType) WITH i as pItemType, qList
UNWIND qList as q
MATCH (q)-[:HAS_AN]->(i:ItemType) WITH q.ID as qID, pItemType, i,
CASE i
WHEN pItemType THEN 1
ELSE 0
END as SameType
RETURN DISTINCT i, qID, pItemType, SameType
In this example, Item with ID = 2 has two HAS_AN relations with 2 ItemType nodes with the same ID. I would like only one of them to be returned.
I've tried to simplify your query. Take a look:
MATCH (:Item {ID : 1234})-[:HAS_AN]->(target:ItemType)
MATCH (item:Item)-[:HAS_AN]->(itemType:ItemType)
WHERE item.ID <> 1234
WITH
itemType.ID AS i,
item.ID AS qID,
collect({
pItemType : target,
SameType : CASE exists((item)-[:HAS_AN]-(target))
WHEN true THEN 1 ELSE 0 END
})[0] as Item
RETURN i, qID, Item.pItemType AS pItemType, Item.SameType AS SameType
The trick is in the two lines after WITH. At this point I'm grouping by itemType.ID and item.ID and not ( and not itemType and item). In your original query you are using pItemType to group. This does not work because the two ItemTypes with ID = 34 are different nodes although they have the same ID.
The output from your console:
+-------------------------------------+
| i | qID | pItemType | SameType |
+-------------------------------------+
| 31 | 4 | Node[2]{ID:5} | 0 |
| 5 | 3 | Node[2]{ID:5} | 1 |
| 31 | 5 | Node[2]{ID:5} | 0 |
| 45 | 5 | Node[2]{ID:5} | 0 |
| 5 | 1 | Node[2]{ID:5} | 1 |
| 34 | 2 | Node[2]{ID:5} | 0 |
+-------------------------------------+
6 rows
33 ms
Thanks to #Bruno's solution, I was able to get the right answers. However, the original solution did not work right off the bat for me for two reasons - I needed the qList since I was referring to it later, and it had approximately 4 times the DB hits as the query in my question. So, I tried a few optimizations that brought the number of DB hits down to half, and am sharing it here for posterity.
MATCH (q:Item) WHERE q.ID <> 1234 WITH COLLECT(DISTINCT(q)) as qList
MATCH (p:Item{ID:1234})-[:HAS_AN]->(i:ItemType) WITH i as pItemType, qList
UNWIND qList as item
MATCH (item)-[:HAS_AN]->(i)
WITH
i, pItemType,
item.ID AS qID,
collect({
pItemType : pItemType,
SameType : CASE i.ID
WHEN pItemType.ID THEN 1 ELSE 0 END
})[0] as Item
RETURN i, qID, Item.pItemType AS pItemType, Item.SameType AS SameType
Turns out running MATCH (item:Item)-[:HAS_AN]->(itemType:ItemType) was adding a Filter operation that took almost as many DB hits as it had matches.

Neo4J returned many values

I have query like this:
MATCH p = (ob:Obiect)<--(w:Word { value:'human' })-[*]-(x) RETURN {id: id(x), value: x.value}
I am returning "x" but i want to return also the “w”. The big question is: how to have “w” returned?
I tried like this:
MATCH p = (ob:Obiect)<--(w:Word { value:'human' })-[*]-(x) RETURN {id: id(x), value: x.value, rootid: id(w)}
but then the output looks like
x.id | x.value | w.id | w.value
x.id | x.value | w.id | w.value
x.id | x.value | w.id | w.value
but for NULL values of x there are no values of “w”, but I also need them.
try OPTIONAL MATCH, see: http://neo4j.com/docs/2.1.5/query-optional-match.html
MATCH (ob:Obiect)<--(w:Word { value:'human' })
OPTIONAL MATCH (w)-[*]-(x)
RETURN {id: id(x), value: x.value, rootid: id(w)}

Resources