How to collect a limited number of nodes using Cypher? - neo4j

If I want to perform a MATCH query on the top 25 results of a previous MATCH, how would I do this in Cypher?
My first query is like:
START inputMovie=node(0)
MATCH (inputGenre:Genre)-[:IS_GENRE]-(inputMovie:Movie)
WITH inputMovie, COLLECT(inputGenre) AS inputGenres
MATCH (genre:Genre)-[o:IS_GENRE]-(outMovies:Movie)
WITH inputGenres, outMovies, genre
WHERE (genre IN inputGenres)
RETURN outMovies.title, count(genre) AS foo
ORDER BY foo desc
LIMIT 25
How would I then perform a MATCH query on only those 25 results? Can I collect a number of nodes? COLLECT(m) would collect every single node matching the WHERE constraint. Is there a way to collect the top 25 from a MATCH query?

[EDITED]
First of all, your current query can be improved. The following shorter and faster query should be equivalent (and avoids using the deprecated START clause):
MATCH (outMovie:Movie)-[:IS_GENRE]-(:Genre)-[:IS_GENRE]-(inputMovie:Movie)
WHERE ID(inputMovie) = 0
MATCH (outMovie)-[:IS_GENRE]-(genre:Genre)
RETURN outMovie.title, count(genre) AS foo
ORDER BY foo DESC
LIMIT 25
In general, you can replace RETURN with WITH if you want to extend an existing query. You just have to provide an alias for returned values that do not already have an identifier (in your case, outMovies.title would need to be aliased).
Here is a simple example with your query:
MATCH (outMovie:Movie)-[:IS_GENRE]-(:Genre)-[:IS_GENRE]-(inputMovie:Movie)
WHERE ID(inputMovie) = 0
MATCH (outMovie)-[:IS_GENRE]-(genre:Genre)
WITH outMovies.title AS title, count(genre) AS foo
ORDER BY foo DESC
LIMIT 25
RETURN COUNT(title) AS num_titles;
The returned num_titles will be at most 25.

Related

Neo4j count Query

match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.

Get the count of multiple properties in neo4j

I am trying to combine 2 cyphers into one for performance but have not succeeded.
I need to get the count of multiple properties unique to eachother in the same cypher.
EX 1:
Match (n)
RETURN n.foo, count(*) AS count
EX 2:
Match (n)
RETURN n.bar, count(*) AS count
I was hoping I could just run both:
Match (n)
RETURN n.foo, count(*) AS fooCount, n.bar, count(*) AS barCount
But this returns the same count for both as it is finding where they both match. Not what I want.
So was looking for a way to group them to be unique like:
Match (n)
RETURN {n.foo, count(*) AS fooCount}, {n.bar, count(*) AS barCount}
Obviously this is not valid syntax but shows what I am trying to do.
Any assistance on this is of course appreciated.
It's best to do this back to back, all at once isn't a good idea for this kind of query, as aggregation won't work in your favor.
You could try this:
MATCH (n)
WITH n.bar as bar, count(*) AS count
WITH collect({bar:bar, count:count}) as barCounts
MATCH (n)
WITH barCounts, n.foo as foo, count(*) AS count
WITH barCounts, collect({foo:foo, count:count}) as fooCounts
RETURN barCounts, fooCounts
Since you are trying to aggregate separate query results, you can also use UNION as a quick and easy way to return both at the same time.
Match (n)
RETURN "foo" as type, n.foo as value, count(*) AS count
UNION ALL
Match (n)
RETURN "bar" as type, n.bar as value, count(*) AS count
Just a few notes, both returns for a UNION must have the same column names.
Also, the "type" column in the example isn't necessary, but it shows how you can add filler if both queries don't have the same number of return columns. (Or if you want to tell which query the result is from.) If there is a "foo" and a "bar" with the same value+count, UNION ALL will keep both, and UNION will drop the duplicate (if you remove the type column).
Maybe it's outdated, but just in case someone needs it, I've found another approach using an APOC function which avoids running multiple times the same MATCH (n). In your case, it could be something like:
MATCH (n)
WITH collect(n.bar) as bars, collect(n.foo) as foos
WITH apoc.coll.frequenciesAsMap(bars) as barCounts, apoc.coll.frequenciesAsMap(foos) as fooCounts
RETURN barCounts, fooCounts
Single MATCH, multiple Counts.
Wish it could help someone!

Getting results count from a query with LIMIT clause

I have a Neo4j database with thousands of nodes.
I'm using this query to find nodes which contains some text inside the desired field:
MATCH (n:MYNODE)
WHERE n.myfield CONTAINS {textToSearch}
RETURN n
ORDER BY n.myfield ASC
LIMIT 50
This query works, and returns the first 50 results ordere by n.myfield.
Let's say 340 nodes match the search criteria: the first 50 get returned. Is there a way to return also the total count? I would like to have the 50 nodes along with the total count (340) for displaying purposes.
I would do a second query like this:
MATCH (n:MYNODE)
WHERE n.myfield CONTAINS {textToSearch}
RETURN count(n)
Is there a way to avoid a second query and include this result in the first one? Neo4j should find all the 340 nodes before limiting them to 50 in the first query, so is there a way to intercept the nodes count before the LIMIT clause is applied and return it aswell?
How about something like this. Order the result and put it in a collection. Then return the size of the collection and the first 50 items in the collection.
MATCH (n:MYNODE)
WHERE n.myfield CONTAINS {textToSearch}
WITH n
ORDER BY n.myfield
WITH COLLECT(n) as matched
RETURN size(matched), matched[..50]

neo4j indegree outdegree union

I want to compute Indegree and Outdegree and return a graph that has a connection between top 5 Indegree nodes and top 5 Outdegree nodes. I have written a code as
match (a:Port1)<-[r]-()
return a.id as NodeIn, count(r) as Indegree
order by Indegree DESC LIMIT 5
union
match (n:Port1)-[r]->()
return n.id as NodeOut, count(r) as Outdegree
order by Outdegree DESC LIMIT 5
union
match p=(u:Port1)-[:LinkTo*1..]->(t:Port1)
where u.id in NodeIn and t.id in NodeOut
return p
I get an error as
All sub queries in an UNION must have the same column names (line 4, column 1 (offset: 99)) "union"
What are the changes that I need to do to the code?
There's a few things we can improve.
The matches you're doing isn't the most efficient way to get incoming and outgoing degrees for relationships.
Also, UNION can only be used to combine query results with identical columns. In this case, we won't even need UNION, we can use WITH to pipe results from one part of a query to another, and COLLECT() the nodes you need in between.
Try this query:
match (a:Port1)
with a, size((a)<--()) as Indegree
order by Indegree DESC LIMIT 5
with collect(a) as NodesIn
match (a:Port1)
with NodesIn, a, size((a)-->()) as Outdegree
order by Outdegree DESC LIMIT 5
with NodesIn, collect(a) as NodesOut
unwind NodesIn as NodeIn
unwind NodesOut as NodeOut
// we now have a cartesian product between both lists
match p=(NodeIn)-[:LinkTo*1..]->(NodeOut)
return p
Be aware that this performs two NodeLabelScans of :Port1 nodes, and does a cross product of the top 5 of each, so there are 25 variable length path matches, which can be expenses, as this generates all possible paths from each NodeIn to each NodeOut.
If you only one the shortest connection between each, then you might try replacing your variable length match with a shortestPath() call, which only returns the shortest path found between each two nodes:
...
match p = shortestPath((NodeIn)-[:LinkTo*1..]->(NodeOut))
return p
Also, make sure your desired direction is correct, as you're matching nodes with the highest in degree and getting an outgoing path to nodes with the highest out degree, that seems like it might be backwards to me, but you know your requirements best.

Limit the results of a union cypher query

Let's say we have the example query from the documentation:
MATCH (n:Actor)
RETURN n.name AS name
UNION
MATCH (n:Movie)
RETURN n.title AS name
I know that if I do that:
MATCH (n:Actor)
RETURN n.name AS name
LIMIT 5
UNION
MATCH (n:Movie)
RETURN n.title AS name
LIMIT 5
I can reduce the returned results of each sub query to 5.How can I LIMIT the total results of the union query?
This is not yet possible, but there is already an open neo4j issue that requests the ability to do post-UNION processing, which includes what you are asking about. You can add a comment to that neo4j issue if you support having it resolved.
This can be done using UNION post processing by rewriting the query using the COLLECT function and the UNWIND clause.
First we turn the columns of a result into a map (struct, hash, dictionary), to retain its structure. For each partial query we use the COLLECT to aggregate these maps into a list, which also reduces our row count (cardinality) to one (1) for the following MATCH. Combining the lists is a simple list concatenation with the “+” operator.
Once we have the complete list, we use UNWIND to transform it back into rows of maps. After this, we use the WITH clause to deconstruct the maps into columns again and perform operations like sorting, pagination, filtering or any other aggregation or operation.
The rewritten query will be as below:
MATCH (n:Actor)
with collect ({name: n.title}) as row
MATCH (n:Movie)
with row + collect({name: n.title}) as rows
unwind rows as row
with row.name as name
return name LIMIT 5
This is possible in 4.0.0
CALL {
MATCH (p:Person) RETURN p
UNION
MATCH (p:Person) RETURN p
}
RETURN p.name, p.age ORDER BY p.name
Read more about Post-union processing here https://neo4j.com/docs/cypher-manual/4.0/clauses/call-subquery/

Resources