neo4j: type counts by distance

neo4j: type counts by distance - neo4j

I want to have some count statistics by type by distance from root. For example,
(A type:'private')-[value:20]->(B type:'private')-[value:40]->(C type:'private')
(A type:'private')-[value:0]->(D type:'public')-[value:20]->(E type:'private')
CREATE (:firm {name:'A', type:'private'}), (:firm {name:'B', type:'private'}), (:firm {name:'C', type:'private'}), (:firm {name:'D', type:'public'}), (:firm {name:'E', type:'private'});
MATCH (a:firm {name:'A'}), (b:firm {name:'B'}), (c:firm {name:'C'}), (d:firm {name:'D'}), (e:firm {name:'E'})
CREATE (a)-[:REL {value: 20}]->(b)->[:REL {value: 40}]->(c),
(a)-[:REL {value: 0}]->(d)->[:REL {value: 20}]->(e);
I want to get the count of each type of A's immediate neighbors and that of the 2nd layer neighbors, i.e.,
+-----------------------------+
| distance | type | count |
+-----------------------------+
| 0 | private | 1 |
| 0 | public | 0 |
| 1 | private | 1 |
| 1 | public | 1 |
| 2 | private | 2 |
| 2 | public | 0 |
+-----------------------------+
Here is a related question about aggregate statistics by distance.
Thanks!

For this on, the apoc library comes in handy:
MATCH path=(:firm {name:'A'})-[:REL*]->(leaf:firm)
WHERE NOT (leaf)-[:REL]->(:firm)
WITH COLLECT(path) AS paths, max(length(path)) AS longest
UNWIND RANGE(0,longest) AS depth
WITH depth,
apoc.coll.frequencies([node IN apoc.coll.toSet(REDUCE(arr=[], path IN [p IN paths WHERE length(p) >= depth] |
arr
+ nodes(path)[depth]
)
) | node.type
]) as typesAtDepth
UNWIND typesAtDepth AS typeAtDepth
RETURN depth, typeAtDepth.item AS type, typeAtDepth.count AS count
for this dataset
CREATE (_174:`firm` { `name`: 'A', `type`: 'type2' }) CREATE (_200:`firm` { `name`: 'D', `type`: 'type2' }) CREATE (_202:`firm` { `name`: 'E', `type`: 'type2' }) CREATE (_203:`firm` { `name`: 'F', `type`: 'type1' }) CREATE (_191:`firm` { `name`: 'B', `type`: 'type1' }) CREATE (_193:`firm` { `name`: 'C', `type`: 'type2' }) CREATE (_174)-[:`REL` { `value`: '0' }]->(_200) CREATE (_200)-[:`REL` { `value`: '20' }]->(_202) CREATE (_202)-[:`REL` { `value`: '99' }]->(_203) CREATE (_174)-[:`REL` { `value`: '20' }]->(_191) CREATE (_191)-[:`REL` { `value`: '40' }]->(_193)
it returns this result:
╒═══════╤═══════╤═══════╕
│"depth"│"type" │"count"│
╞═══════╪═══════╪═══════╡
│0 │"type2"│1 │
├───────┼───────┼───────┤
│1 │"type2"│1 │
├───────┼───────┼───────┤
│1 │"type1"│1 │
├───────┼───────┼───────┤
│2 │"type2"│2 │
├───────┼───────┼───────┤
│3 │"type1"│1 │
└───────┴───────┴───────┘

Related

extract decorating nodes if it exists but still return path if decorating nodes does not exist

I have the following graph
(y1:Y)
^
|
(a1:A) -> (b1:B) -> (c1:C)
(e1:E)
^
|
(d1:D)
^
|
(a2:A) -> (b2:B) -> (c2:C)
(a3:A) -> (b3:B) -> (c3:C)
I would like to find path between node label A and C. I can use the query
match p=((:A)-[*]->(:C))
return p
But I also want to get node label Y and node label D, E if these decorating nodes exists. If I try:
match p=((:A)-[*]->(cc:C)), (cc)-->(yy:Y), (cc)-[*]->(dd:D)-[*]->(ee:E)
return p, yy, dd, ee
Then it is only going to return the path if the C node has Y, D, E connects to it.
The output that I need is:
a1->b1->c1, y1, null
a2->b2->c2, null, [[d1, e1]]
a3->b3->c3, null, null
I.e., if decorating node does not exist, then just return null. For the array, it can be null or empty array. Also D and E nodes will be group into an array of arrays since there could be many pairs of D and E.
What is the best way to achieve this?

This should do it, returning an empty array for the deDecoration if there aren't any D-E decorations
MATCH p=((:A)-[*]->(c:C))
WITH p,
HEAD([(c)--(y:Y) | y ]) AS yDecoration,
[(c)-[*]->(d:D)-[*]->(e:E) | [d,e]] AS deDecoration
RETURN p, yDecoration, deDecoration
with this graph (multiple D-E)
this query
MATCH p=((:A)-[*]->(c:C))
WITH REDUCE(s='' , node IN nodes(p) | s + CASE WHEN s='' THEN '' ELSE '->' END + node.name) AS p,
HEAD([(c)--(y:Y) | y.name ]) AS yDecoration,
[(c)-[*]->(d:D)-[*]->(e:E) | [d.name,e.name]] AS deDecoration
RETURN p, yDecoration, deDecoration
returns
╒════════════╤═════════════╤═════════════════════════╕
│"p" │"yDecoration"│"deDecoration" │
╞════════════╪═════════════╪═════════════════════════╡
│"A2->B2->C2"│null │[] │
├────────────┼─────────────┼─────────────────────────┤
│"A1->B1->C1"│null │[["D2","E2"],["D1","E1"]]│
├────────────┼─────────────┼─────────────────────────┤
│"A3->B3->C3"│"Y1" │[] │
└────────────┴─────────────┴─────────────────────────┘

neo4j aggregate function by distance

I want to have some aggregated statistics by distance from root. For example,
(A)-[value:20]->(B)-[value:40]->(C)
(A)-[value:0]->(D)-[value:20]->(E)
CREATE (:firm {name:'A'}), (:firm {name:'B'}), (:firm {name:'C'}), (:firm {name:'D'}), (:firm {name:'E'});
MATCH (a:firm {name:'A'}), (b:firm {name:'B'}), (c:firm {name:'C'}), (d:firm {name:'D'}), (e:firm {name:'E'})
CREATE (a)-[:REL {value: 20}]->(b)->[:REL {value: 40}]->(c),
(a)-[:REL {value: 0}]->(d)->[:REL {value: 20}]->(e);
I want to get the average value of A's immediate neighbors and that of the 2nd layer neighbors, i.e.,
+-------------------+
| distance | avg |
+-------------------+
| 1 | 10 |
| 2 | 30 |
+-------------------+
How should I do it? I have tried the following
MATCH p=(n:NODE {name:'A'})-[r:REL*1..2]->(n:NODE)
RETURN length(p), sum(r:value);
But I am not sure how to operate on the variable-length path r.
Similarly, is it possible to get the cumulative value? i.e.,
+-------------------+
| name | cum |
+-------------------+
| B | 20 |
| C | 60 |
| D | 0 |
| E | 20 |
+-------------------+

The query below solves the first problem. Please note that it also solves the case where paths are not of equal length. I added (E)-[REL {value:99}]->(F)
MATCH path=(:firm {name:'A'})-[:REL*]->(leaf:firm)
WHERE NOT (leaf)-[:REL]->(:firm)
WITH COLLECT(path) AS paths, max(length(path)) AS longest
UNWIND RANGE(1,longest) AS depth
WITH depth,
REDUCE(sum=0, path IN [p IN paths WHERE length(p) >= depth] |
sum
+ relationships(path)[depth-1].value
) AS sumAtDepth,
SIZE([p IN paths WHERE length(p) >= depth]) AS countAtDepth
RETURN depth, sumAtDepth, countAtDepth, sumAtDepth/countAtDepth AS avgAtDepth
returning
╒═══════╤════════════╤══════════════╤════════════╕
│"depth"│"sumAtDepth"│"countAtDepth"│"avgAtDepth"│
╞═══════╪════════════╪══════════════╪════════════╡
│1 │20 │2 │10 │
├───────┼────────────┼──────────────┼────────────┤
│2 │60 │2 │30 │
├───────┼────────────┼──────────────┼────────────┤
│3 │99 │1 │99 │
└───────┴────────────┴──────────────┴────────────┘
The second question can be answered as follows:
MATCH (root:firm {name:'A'})
MATCH (descendant:firm) WHERE EXISTS((root)-[:REL*]->(descendant))
WITH root,descendant
WITH descendant,
REDUCE(sum=0,rel IN relationships([(descendant)<-[:REL*]-(root)][0][0]) |
sum + rel.value
) AS cumulative
RETURN descendant.name,cumulative ORDER BY descendant.name
returning
╒═════════════════╤════════════╕
│"descendant.name"│"cumulative"│
╞═════════════════╪════════════╡
│"B" │20 │
├─────────────────┼────────────┤
│"C" │60 │
├─────────────────┼────────────┤
│"D" │0 │
├─────────────────┼────────────┤
│"E" │20 │
├─────────────────┼────────────┤
│"F" │119 │
└─────────────────┴────────────┘

may I suggest your try it with a reduce function, you can retro fit it your code
// Match something name or distance..
MATCH
// If you have a condition put in here
// WHERE A<>B AND n.name = m.name
// WITH filterItems, collect(m) AS myItems
// Reduce will help sum/aggregate entire you are looking for
RETURN reduce( sum=0, x IN myItems | sum+x.cost )
LIMIT 10;

How to match node labels using OR?

match (p:Product {id:'5116003'})-[r]->(o:Attributes|ExtraAttribute) return p, o
How to match two possible node labels in such a query?
Per cybersam's suggestion, I changed to the follwoing:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE o:Attributes OR o:ExtraAttributes
**WHERE any(key in keys(o) WHERE toLower(key) contains 'weight')**
return o
Now I need to add the 2nd 'where' clause. How to modify that?

You can try using any() function:
match (p:Product {id:'5116003'})-[r]->(o)
where any (label in labels(o) where label in ['Attributes', 'ExtraAttribute'])
return p, o
Also, if you have APOC procedures, you can use apoc.path.expand path expander procedure that expands from start node following the given relationships from min to max-level adhering to the label filters.
match (p:Product {id:'5116003'})
call apoc.path.expand(p, null,"+Attributes|ExtraAttribute",0,1) yield path
with nodes(path) as nodes
// return p and o nodes
return nodes[0], nodes[1]
See more here.

These two single-label forms of your query:
MATCH (p:Product {id:'5116003'})-->(o:Attributes) RETURN p, o;
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes RETURN p, o;
produce the same execution plan, as follows (I assume that there is an index on :Product(id)):
+-----------------+----------------+------+---------+------------------+--------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+--------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+--------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | o:Attributes |
| | +----------------+------+---------+------------------+--------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+--------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+--------------+
This two-label form of the second query above:
MATCH (p:Product {id:'5116003'})-->(o) WHERE o:Attributes OR o: ExtraAttribute RETURN p, o;
produces an execution plan that is very similar (and therefore probably not much more expensive):
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
| +ProduceResults | 0 | 0 | 0 | o, p | p, o |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Filter | 0 | 0 | 0 | anon[33], o, p | Ors(o:Attributes, o:ExtraAttribute) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +Expand(All) | 0 | 0 | 0 | anon[33], o -- p | (p)-->(o) |
| | +----------------+------+---------+------------------+-------------------------------------+
| +NodeIndexSeek | 0 | 0 | 1 | p | :Product(id) |
+-----------------+----------------+------+---------+------------------+-------------------------------------+
By the way, the first query in the answer by #BrunoPeres has a similar execution plan as well, but the Filter operation is very different. It is not clear which would be faster.
[UPDATE]
To answer your updated question: since you cannot have 2 back-to-back WHERE clauses, you can just add more terms to the already existing WHERE clause, like so:
MATCH (p:Product {id:'5116003'})-[r]->(o)
WHERE
(o:Attributes OR o:ExtraAttributes) AND
ANY(key in KEYS(o) WHERE TOLOWER(key) CONTAINS 'weight')
RETURN o;

Aggregating over Distinct Nodes

The data model I have is: (Item)-[:HAS_AN]->(ItemType) and both node types have a field called ID. Items can be related to multiple ItemTypes and in some cases, these ItemTypes can have the same IDs. I'm trying to populate a structure {ID:..., SameType:...}, where SameType = 1 if a node has the same item type as some node (with ID = 1234), and 0 otherwise.
First, I get the candidate list of nodes qList and the source node's ItemType:
MATCH (p:Item{ID:1234})-[:HAS_AN]->(i)
WITH i as pItemType, qList
Then, I go through qList, comparing each node's ItemType to pItemType (which is the ItemType of the source node):
UNWIND qList as q
MATCH (q)-[:HAS_AN]->(i)
WITH q.ID as qID, pItemType, i,
CASE i
WHEN pItemType THEN 1
ELSE 0
END as SameType
RETURN DISTINCT i, qID, pItemType, SameType
The problem I have is when some nodes have two ItemTypes with the same ID. This gives results where some of the entries are duplicates:
{ | | { |
"ID": 18 | 35258417 | "ID": 71 | 0
} | | } |
{ | | { |
"ID": 18 | 35258417 | "ID": 71 | 0
} | | } |
while I'd like to only take one such row, if more than one exists.
Placing DISTINCT where I have in the last part of the query doesn't seem to work. What's the best way to filter out such duplicates?
Update:
Here is a sample data subset: http://console.neo4j.org/r/f74pdq
Here are the queries that I'm running
MATCH (q:Item) WHERE q.ID <> 1234 WITH COLLECT(DISTINCT(q)) as qList
MATCH (p:Item{ID:1234})-[:HAS_AN]->(i:ItemType) WITH i as pItemType, qList
UNWIND qList as q
MATCH (q)-[:HAS_AN]->(i:ItemType) WITH q.ID as qID, pItemType, i,
CASE i
WHEN pItemType THEN 1
ELSE 0
END as SameType
RETURN DISTINCT i, qID, pItemType, SameType
In this example, Item with ID = 2 has two HAS_AN relations with 2 ItemType nodes with the same ID. I would like only one of them to be returned.

I've tried to simplify your query. Take a look:
MATCH (:Item {ID : 1234})-[:HAS_AN]->(target:ItemType)
MATCH (item:Item)-[:HAS_AN]->(itemType:ItemType)
WHERE item.ID <> 1234
WITH
itemType.ID AS i,
item.ID AS qID,
collect({
pItemType : target,
SameType : CASE exists((item)-[:HAS_AN]-(target))
WHEN true THEN 1 ELSE 0 END
})[0] as Item
RETURN i, qID, Item.pItemType AS pItemType, Item.SameType AS SameType
The trick is in the two lines after WITH. At this point I'm grouping by itemType.ID and item.ID and not ( and not itemType and item). In your original query you are using pItemType to group. This does not work because the two ItemTypes with ID = 34 are different nodes although they have the same ID.
The output from your console:
+-------------------------------------+
| i | qID | pItemType | SameType |
+-------------------------------------+
| 31 | 4 | Node[2]{ID:5} | 0 |
| 5 | 3 | Node[2]{ID:5} | 1 |
| 31 | 5 | Node[2]{ID:5} | 0 |
| 45 | 5 | Node[2]{ID:5} | 0 |
| 5 | 1 | Node[2]{ID:5} | 1 |
| 34 | 2 | Node[2]{ID:5} | 0 |
+-------------------------------------+
6 rows
33 ms

Thanks to #Bruno's solution, I was able to get the right answers. However, the original solution did not work right off the bat for me for two reasons - I needed the qList since I was referring to it later, and it had approximately 4 times the DB hits as the query in my question. So, I tried a few optimizations that brought the number of DB hits down to half, and am sharing it here for posterity.
MATCH (q:Item) WHERE q.ID <> 1234 WITH COLLECT(DISTINCT(q)) as qList
MATCH (p:Item{ID:1234})-[:HAS_AN]->(i:ItemType) WITH i as pItemType, qList
UNWIND qList as item
MATCH (item)-[:HAS_AN]->(i)
WITH
i, pItemType,
item.ID AS qID,
collect({
pItemType : pItemType,
SameType : CASE i.ID
WHEN pItemType.ID THEN 1 ELSE 0 END
})[0] as Item
RETURN i, qID, Item.pItemType AS pItemType, Item.SameType AS SameType
Turns out running MATCH (item:Item)-[:HAS_AN]->(itemType:ItemType) was adding a Filter operation that took almost as many DB hits as it had matches.

Neo4J returned many values

I have query like this:
MATCH p = (ob:Obiect)<--(w:Word { value:'human' })-[*]-(x) RETURN {id: id(x), value: x.value}
I am returning "x" but i want to return also the “w”. The big question is: how to have “w” returned?
I tried like this:
MATCH p = (ob:Obiect)<--(w:Word { value:'human' })-[*]-(x) RETURN {id: id(x), value: x.value, rootid: id(w)}
but then the output looks like
x.id | x.value | w.id | w.value
x.id | x.value | w.id | w.value
x.id | x.value | w.id | w.value
but for NULL values of x there are no values of “w”, but I also need them.

try OPTIONAL MATCH, see: http://neo4j.com/docs/2.1.5/query-optional-match.html
MATCH (ob:Obiect)<--(w:Word { value:'human' })
OPTIONAL MATCH (w)-[*]-(x)
RETURN {id: id(x), value: x.value, rootid: id(w)}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

neo4j: type counts by distance - neo4j

Related

extract decorating nodes if it exists but still return path if decorating nodes does not exist

neo4j aggregate function by distance

How to match node labels using OR?

Aggregating over Distinct Nodes

Neo4J returned many values

Categories

Resources