I am writing a query to display a graph comprising of all the journals and their publication place (cities). I would like to filter the query by selecting only the Cities which are the publication place of more than 3 journals.
My attempt does give me cities and the count but I cannot manage to have the journal.name and the relationship in the result
MATCH (j:journal)-[p:publication_city]->(c:City)
WITH c, count(c) as cnt
WHERE cnt > 3
RETURN c, cnt
ORDER BY cnt
Whatever change to add the journal variable in the query above (e.g. WITH c, count(c) as cnt, j) lead to empty result
Anyone who knows what I am doing wrong?
You can use COLLECT clause to get all journals with more than 3 publications. Then UNWIND to list them out one by one. UNWIND is like a "for loop" in sql.
MATCH (j:journal)-[:publication_city]-(c:city)
WITH c, count(c) as cnt, collect(j) as journals WHERE cnt > 3
UNWIND journals as journal
RETURN journal, c, cnt
ORDER BY cnt
Related
We have a large graph (over 1 billion edges) that has multiple relationship types between nodes.
In order to check the number of nodes that have a single unique relationship between nodes (i.e. a single relationship between two nodes per type, which otherwise would not be connected) we are running the following query:
MATCH (n)-[:REL_TYPE]-(m)
WHERE size((n)-[]-(m))=1 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
To demonstrate a similar result, the below sample code can run on the movie graph after running
:play movies in an empty graph, resulting with 4 nodes (in this case we are asking for nodes with 3 types of relationships)
MATCH (n)-[]-(m)
WHERE size((n)-[]-(m))=3 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
Is there a better/more efficient way to query the graph?
The following query is more performant, since it only scans each relationship once [whereas size((n)--(m)) will cause relationships to be scanned multiple times]. It also specifies a relationship direction to filter out half of the relationship scans, and to avoid the need for comparing native IDs.
MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)
NOTE: It is not clear what you are using the COUNT(DISTINCT n) + COUNT(DISTINCT m) result for, but be aware that it is possible for some nodes to be counted twice after the addition.
[UPDATE]
If you want to get the actual number of distinct nodes that pass your filter, here is one way to do that:
MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
WITH COLLECT(n) + COLLECT(m) AS nodes
UNWIND nodes AS node
RETURN COUNT(DISTINCT node)
I got 2 node types, let's say A and B, and a relationship with a property, let's call it 'a_has_b' with the property 'value'
First I want to count the number of relationships a specific node of type A has.
MATCH (a:A)-[r:a_has_b]->(b:B)
WHERE a.id='123'
RETURN COUNT(r) as count
I also want to get the top n B's ordered by the property from the relationship
MATCH (a:A)-[r:a_has_b]->(b:B)
WHERE a.id='123'
RETURN r, b
ORDER BY r.value
LIMIT 3
Now, it's clearly I am doing the same thing twice, changing the return value.
How can I combine them together to get both needed results?
You can combine collect and range:
MATCH (a:A)-[r:a_has_b]->(b:B)
WHERE a.id='123'
WITH a,
r,
b
ORDER BY r.value
RETURN a,
COUNT(r) AS count,
COLLECT([r,b])[0..3] AS rels
I want to display the users whose sum of amount of transactions is greater than 5000
How do I display the relationship [:TRANS_AMOUNT] too.
My query
MATCH(c)-[r:TRANS_AMOUNT]->(e)
WITH sum(toInt(e.totalAmount))as l,c
WHERE l>5000
RETURN c,l;
The above query groups the sum by customer and checks if sum amount is greater than 5000. How do I display the relationships where this happens too?
Add the relationship to the WITH statement and return it:
MATCH (c)-[r:TRANS_AMOUNT]->(e)
WITH sum(toInt(e.totalAmount))as l, c, r
WHERE l>5000
RETURN c, l, r
You can also aggregate the relationships in order to have one row per user in the result:
MATCH (c)-[r:TRANS_AMOUNT]->(e)
WITH sum(toInt(e.totalAmount))as l, c, collect(r) as rels
WHERE l>5000
RETURN c, l, rels
I am looking for a way to perform one aggregate function on top of the results of another one. In particular, I would like to join the following two queries:
MATCH (e :Event) - [:ATTENDED_BY] -> (a :Person)
WITH e, collect(a) AS attendants
WHERE ALL (a in attendants WHERE a.Company="XYZ")
RETURN e.name AS name, count(*) as number_occurrences
ORDER BY number_events DESC;
MATCH (e:Event) - [:ATTENDED_BY] -> (a :Person)
WITH e, collect(a) AS attendants
WHERE ALL (a in attendants WHERE a.Company="XYZ")
WITH e.name AS name, count(*) as number_occurrences
RETURN percentileDisc(number_occurrences,0.95) as percentile;
The first query gives all the event names wwhere only people from a single company ("XYZ") attended, as well as the number of occurrences of those events. The second one returns the minimum number of occurrences for the top 5% most frequent events. What I would like to get is the names and number of occurrences of these 5% most frequent events. Any suggestions?
I managed to solve the query using WITH clause, of course, but the key was to understand its usageproperly. The idea is that only the variables passed with the last WITH clause are visible further. That is why after we get the "percentile" variable in the first part of the query, we need to keep on passing it in the second part of the query in all the subsequent WITH clauses.
MATCH (e :Event) - [:ATTENDED_BY] -> (a :Person)
WITH e, collect(a) AS attendants
WHERE ALL (a in attendants WHERE a.Company="XYZ")
WITH e.name AS name, count(*) as number_occurences
WITH percentileDisc(number_occurencies,0.95) as percentile
MATCH (e :Event) - [:ATTENDED_BY] -> (a :Person)
WITH percentile, e, collect(a) AS attendants
WHERE ALL (a in attendants WHERE a.Company="XYZ")
WITH percentile, e.name AS name, count(*) as number_occurences
WHERE number_occurences > percentile
RETURN name, number_occurences
ORDER BY number_occurences DESC
Each node have multiple incoming relationship with different properties.i want to find the incoming relationship property say "rel_name"
for ALL (x IN nodes(p)) have same value
You can try this (it returns rel_name with count of nodes and collection of nodes):
MATCH (a)<-[r]-()
WITH r.rel_name AS name, count(a) AS count, collect(a) AS coll
WHERE count > 1
RETURN name, count, coll
ORDER BY name
or this (it returns all duplicated nodes):
MATCH (a)<-[r]-()
WITH r.rel_name AS name, count(a) AS count, collect(a) AS coll
WHERE count > 1
UNWIND coll AS c
RETURN c