I want to display the users whose sum of amount of transactions is greater than 5000
How do I display the relationship [:TRANS_AMOUNT] too.
My query
MATCH(c)-[r:TRANS_AMOUNT]->(e)
WITH sum(toInt(e.totalAmount))as l,c
WHERE l>5000
RETURN c,l;
The above query groups the sum by customer and checks if sum amount is greater than 5000. How do I display the relationships where this happens too?
Add the relationship to the WITH statement and return it:
MATCH (c)-[r:TRANS_AMOUNT]->(e)
WITH sum(toInt(e.totalAmount))as l, c, r
WHERE l>5000
RETURN c, l, r
You can also aggregate the relationships in order to have one row per user in the result:
MATCH (c)-[r:TRANS_AMOUNT]->(e)
WITH sum(toInt(e.totalAmount))as l, c, collect(r) as rels
WHERE l>5000
RETURN c, l, rels
Related
I have a graph where a pair of nodes can have several relationships between them.
I would like to count this relationships between each pair of nodes, and set it as a parameter of each relationship.
I tried something like:
MATCH (s:LabeledExperience)-[r:NextExp]->(e:LabeledExperience)
with s, e, r, length(r) as cnt
MATCH (s2:LabeledExperience{name:s.name})-[r2:NextExp{name:r.name}]->(e2:LabeledExperience{name: e.name})
SET r2.weight = cnt
But this set the weight always to one.
I also tried:
MATCH ()-[r:NextExp]->()
with r, length(r) as cnt
MATCH ()-[r2:NextExp{name:r.name}]->()
SET r2.weight = cnt
But this takes too much time since there are more than 90k relationships and there is no index (since it is not possible to have them on edges).
They are always set to 1 because of the way you are counting.
When you group by s, e, r that is always going to result in a single row. But if you collect(r) for every s, e then you will get a collection of all of the :NextExp relationships between those two nodes.
Also, length() is for measuring the length (number of nodes) in a matched path and should not work directly on a relationship.
Match the relationship and put them in a collection for each pair of nodes. Iterate over each rel in the collection and set the size of the collection of rels.
MATCH (s:LabeledExperience)-[r:NextExp]->(e:LabeledExperience)
WITH s, e, collect(r) AS rels
UNWIND rels AS rel
SET rel.weight = size(rels)
I was practicing with the Movie Database from Neo4j in order to practice and I have done the next query:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
RETURN a
This query returns 3 rows but If I go to the graph view on the web editor and expand the "Tom Hanks" node I, of course, have one movie such that Tom Hanks directed and acted in that movie but the rest of the connected nodes only have the ACTED_IN relation. What I want to do is to, in this case, filter and remove Tom Hanks from the result since he has at least one connection such that it has only one relation (either ACTED_IN or DIRECTED)
PD: My expected result would be only the row representing node "Clint Eastwood"
So you only want results where the person acted in and directed the same movies, but never simply acted in, without directing, or directed, without acting.
You could use this approach:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
WITH a, count(m) as actedDirectedCount
WHERE size((a)-[:ACTED_IN]->()) = actedDirectedCount AND size((a)-[:DIRECTED]->()) = actedDirectedCount
RETURN a
Though you can simplify this a bit by combining the relationship types in the pattern used in your WHERE clause like so:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
WITH a, count(m) as actedDirectedCount
WHERE size((a)-[:ACTED_IN|DIRECTED]->()) = actedDirectedCount * 2
RETURN a
If the actedDirectedCount = 3 movies, then there must be at a minimum 3 :ACTED_IN relationships and 3 :DIRECTED relationships, so a minimum of 6 relationships using either relationship. If there are any more than this, then there are additional movies that they either acted in or directed, so we'd filter that out.
There options come to my mind:
1.
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with a, collect(distinct m) as directedMovies
match (a)-[:ACTED_IN]->(m:Movie)
with a, directedMovies, collect(distinct m) as actedMovies
with a where all(x in directedMovies where x in actedMovies) and all(x in actedMovies where x in directedMovies)
return a
2.
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with * order by id(m)
with a, collect(distinct m) as directedMovies
match (a)-[:ACTED_IN]->(m:Movie)
with a, directedMovies, m order by id (m)
with a, directedMovies, collect(distinct m) as actedMovies
with a where actedMovies=directedMovies
return a
MATCH (m:Movie)<-[:DIRECTED]-(a:Person)
with a, collect(distinct m) as directedMovies
with * where all(x in directedMovies where (a)-[:ACTED_IN]->(x))
MATCH (m:Movie)<-[:ACTED_IN]-(a)
with a, collect(distinct m) as actedMovies
with * where all(x in actedMovies where (a)-[:DIRECTED]->(x))
return a
The first two are equally expensive and the last one is a bit more expensive.
We have a large graph (over 1 billion edges) that has multiple relationship types between nodes.
In order to check the number of nodes that have a single unique relationship between nodes (i.e. a single relationship between two nodes per type, which otherwise would not be connected) we are running the following query:
MATCH (n)-[:REL_TYPE]-(m)
WHERE size((n)-[]-(m))=1 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
To demonstrate a similar result, the below sample code can run on the movie graph after running
:play movies in an empty graph, resulting with 4 nodes (in this case we are asking for nodes with 3 types of relationships)
MATCH (n)-[]-(m)
WHERE size((n)-[]-(m))=3 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
Is there a better/more efficient way to query the graph?
The following query is more performant, since it only scans each relationship once [whereas size((n)--(m)) will cause relationships to be scanned multiple times]. It also specifies a relationship direction to filter out half of the relationship scans, and to avoid the need for comparing native IDs.
MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
NOTE: It is not clear what you are using the COUNT(DISTINCT n) + COUNT(DISTINCT m) result for, but be aware that it is possible for some nodes to be counted twice after the addition.
[UPDATE]
If you want to get the actual number of distinct nodes that pass your filter, here is one way to do that:
MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
WITH COLLECT(n) + COLLECT(m) AS nodes
UNWIND nodes AS node
RETURN COUNT(DISTINCT node)
I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).
I got 2 node types, let's say A and B, and a relationship with a property, let's call it 'a_has_b' with the property 'value'
First I want to count the number of relationships a specific node of type A has.
MATCH (a:A)-[r:a_has_b]->(b:B)
WHERE a.id='123'
RETURN COUNT(r) as count
I also want to get the top n B's ordered by the property from the relationship
MATCH (a:A)-[r:a_has_b]->(b:B)
WHERE a.id='123'
RETURN r, b
ORDER BY r.value
LIMIT 3
Now, it's clearly I am doing the same thing twice, changing the return value.
How can I combine them together to get both needed results?
You can combine collect and range:
MATCH (a:A)-[r:a_has_b]->(b:B)
WHERE a.id='123'
WITH a,
r,
b
ORDER BY r.value
RETURN a,
COUNT(r) AS count,
COLLECT([r,b])[0..3] AS rels