I have a Neo4j database with thousands of nodes.
I'm using this query to find nodes which contains some text inside the desired field:
MATCH (n:MYNODE)
WHERE n.myfield CONTAINS {textToSearch}
RETURN n
ORDER BY n.myfield ASC
LIMIT 50
This query works, and returns the first 50 results ordere by n.myfield.
Let's say 340 nodes match the search criteria: the first 50 get returned. Is there a way to return also the total count? I would like to have the 50 nodes along with the total count (340) for displaying purposes.
I would do a second query like this:
MATCH (n:MYNODE)
WHERE n.myfield CONTAINS {textToSearch}
RETURN count(n)
Is there a way to avoid a second query and include this result in the first one? Neo4j should find all the 340 nodes before limiting them to 50 in the first query, so is there a way to intercept the nodes count before the LIMIT clause is applied and return it aswell?
How about something like this. Order the result and put it in a collection. Then return the size of the collection and the first 50 items in the collection.
MATCH (n:MYNODE)
WHERE n.myfield CONTAINS {textToSearch}
WITH n
ORDER BY n.myfield
WITH COLLECT(n) as matched
RETURN size(matched), matched[..50]
Related
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I am writing an api to return neo4j data. For my case I get all nodes matching.
API takes in userId, limit and offset and return a list of data matching that condition.
I found one solution Cypher to return total node count as well as a limited set but it is pretty old. Not sure if this is still the best way to do it.
Performance is same as firing 2 separate queries, atleast then one of them would be cached by neo4j after couple of runs.
Match(u:WorkstationUser {id: "alw:44807"})-[:HAS_ACCESS_TO]->(p) return distinct(p) skip 0 limit 10
Match(u:WorkstationUser {id: "alw:44807"})-[:HAS_ACCESS_TO]->(p) return count(distinct(p))
I want the result to be something like
{
items: [ {}, {}], # query 1
total: 100, # query 2
limit: 10, # can get from input
skip: 0 # can get from input
}
This will depend a bit on how much information you need from the nodes for which you want the count, and whether you need to get distinct results or not.
If distinct results are not needed, and you don't need to do any additional filtering on the relationship or node at the other end (no filtering of the label or properties of the node), then you can use the size() of the pattern which will use the degree information of the relationships present on the node, which is more efficient as you never have to actually expand out the relationships:
MATCH (u:WorkstationUser {id: "alw:44807"})
WITH u, size((u)-[:HAS_ACCESS_TO]->(p)) as total
MATCH (u)-[:HAS_ACCESS_TO]->(p)
RETURN p, total
SKIP 0 LIMIT 10
However if distinct results are needed, or you need to filter the node by label or properties, then you will have to expand all the results to get the total. If there aren't too many results (millions or billions) then you can collect the distinct nodes, get the size of the collection, then UNWIND the results and page:
MATCH (:WorkstationUser {id: "alw:44807"})-[:HAS_ACCESS_TO]->(p)
WITH collect(DISTINCT p) as pList
WITH pList, size(pList) as total
UNWIND pList as p
RETURN p, total
SKIP 0 LIMIT 10
I want to count how many labels in my graph and execute the following:
match (n) return (count(labels(n)))
The count returned by this statement isn't the same as the count when I can see from the listed labels highlighted in different colors in the Browser. There are two more labels listed in the browser than the count returned by the function.
Why is that?
Your query is getting the label collection for each node, and then counting how many collections there are, which is the same as the number of nodes.
To get a count of the number of labels in the DB, you can use the APOC procedure apoc.meta.stats, which returns a variety of DB statistics. For your specific case, you can do this:
CALL apoc.meta.stats() YIELD labelCount
RETURN labelCount;
This cypher query will return a list of node labels and their counts:
match (n) return labels(n),count(n)
If you are seeking the count of a specific label, use
match (n:{your label}) return count(n)
If you want the count of the number of distinct labels:
match (n) with collect(distinct labels(n)) as NL return size(NL)
If I want to perform a MATCH query on the top 25 results of a previous MATCH, how would I do this in Cypher?
My first query is like:
START inputMovie=node(0)
MATCH (inputGenre:Genre)-[:IS_GENRE]-(inputMovie:Movie)
WITH inputMovie, COLLECT(inputGenre) AS inputGenres
MATCH (genre:Genre)-[o:IS_GENRE]-(outMovies:Movie)
WITH inputGenres, outMovies, genre
WHERE (genre IN inputGenres)
RETURN outMovies.title, count(genre) AS foo
ORDER BY foo desc
LIMIT 25
How would I then perform a MATCH query on only those 25 results? Can I collect a number of nodes? COLLECT(m) would collect every single node matching the WHERE constraint. Is there a way to collect the top 25 from a MATCH query?
[EDITED]
First of all, your current query can be improved. The following shorter and faster query should be equivalent (and avoids using the deprecated START clause):
MATCH (outMovie:Movie)-[:IS_GENRE]-(:Genre)-[:IS_GENRE]-(inputMovie:Movie)
WHERE ID(inputMovie) = 0
MATCH (outMovie)-[:IS_GENRE]-(genre:Genre)
RETURN outMovie.title, count(genre) AS foo
ORDER BY foo DESC
LIMIT 25
In general, you can replace RETURN with WITH if you want to extend an existing query. You just have to provide an alias for returned values that do not already have an identifier (in your case, outMovies.title would need to be aliased).
Here is a simple example with your query:
MATCH (outMovie:Movie)-[:IS_GENRE]-(:Genre)-[:IS_GENRE]-(inputMovie:Movie)
WHERE ID(inputMovie) = 0
MATCH (outMovie)-[:IS_GENRE]-(genre:Genre)
WITH outMovies.title AS title, count(genre) AS foo
ORDER BY foo DESC
LIMIT 25
RETURN COUNT(title) AS num_titles;
The returned num_titles will be at most 25.
Is it possible to extract in a single cypher query a limited set of nodes and the total number of nodes?
match (n:Molecule) with n, count(*) as nb limit 10 return {N: nb, nodes: collect(n)}
The above query properly returns the nodes, but returns 1 as number of nodes. I certainly understand why it returns 1, since there is no grouping, but can't figure out how to correct it.
The following query returns the counter for the entire number of rows (which I guess is what was needed). Then it matches again and limits your search, but the original counter is still available since it is carried through via the WITH-statement.
MATCH
(n:Molecule)
WITH
count(*) AS cnt
MATCH
(n:Molecule)
WITH
n, cnt LIMIT 10
RETURN
{ N: cnt, nodes:collect(n) } AS molecules
Here is an alternate solution:
match (n:Molecule) return {nodes: collect(n)[0..5], n: length(collect(n))}
84 ms for 30k nodes, shorter but not as efficient as the above one proposed by wassgren.