I'm looking for an explanation of how this works and why doesn't return the number of nodes in a path. Suppose I matched a path p. Now:
WITH p, count(nodes(p)) AS L1 RETURN L1
returns 1.
When this is clear, how do I count paths nodes properly?
count() is an aggregate function. When using any aggregate function, result rows will be grouped by whatever is included in the RETURN clause and not an aggregate function. In this case, result rows will be grouped by p and the return value will be count(nodes(p)).
nodes(p) returns an array of nodes, so count(nodes(p)) will return the count of arrays and will always equal 1.
In order return the amount of nodes in the path you should use size(nodes(p)).
If you're just interested in the length of a path and not particularly in the nodes that are included in it, I would encourage you to use length(p). This will return the length in rels for a given path, without having to manipulate/access the nodes.
Related
I have a query that I am trying to execute. The query works, but there isn't an option to see this data in graph format. Instead the data is returned in table/text format.
When I simplify the query, the output is displayed in graph format - No idea why,
This is the query that is giving me the issue:
MATCH (p:Person)-[hi:hasIdentity]->(i:Identity)
MATCH (j:Person)-[hi2:hasIdentity]->(i2:Identity)
MATCH (i)-[bl:Linked]->(i2)
WHERE NOT p=j
return DISTINCT(p.id), COUNT(DISTINCT(j))
LIMIT 5
Does anyone have any idea why that might be the case?
You'll need to return variables associated with nodes and/or relationships for it to display as a graph. As it is now you're returning properties of nodes (p.id), probably integers or strings. Try this return instead:
...
RETURN p, COUNT(DISTINCT j)
LIMIT 5
By the way, DISTINCT isn't a function, no need for parenthesis, and when you have a RETURN or WITH that has an aggregation, you don't need to use DISTINCT for that line since the non-aggregation variables become distinct since they act as the grouping key for the aggregation.
I want to count how many labels in my graph and execute the following:
match (n) return (count(labels(n)))
The count returned by this statement isn't the same as the count when I can see from the listed labels highlighted in different colors in the Browser. There are two more labels listed in the browser than the count returned by the function.
Why is that?
Your query is getting the label collection for each node, and then counting how many collections there are, which is the same as the number of nodes.
To get a count of the number of labels in the DB, you can use the APOC procedure apoc.meta.stats, which returns a variety of DB statistics. For your specific case, you can do this:
CALL apoc.meta.stats() YIELD labelCount
RETURN labelCount;
This cypher query will return a list of node labels and their counts:
match (n) return labels(n),count(n)
If you are seeking the count of a specific label, use
match (n:{your label}) return count(n)
If you want the count of the number of distinct labels:
match (n) with collect(distinct labels(n)) as NL return size(NL)
MATCH (p:Product), (s:Student), (b:Boy), (a:Attribute)
RETURN count(distinct(p)), count(distinct(s)), count(distinct(b)), count(distinct(a))
I want to know how many counts of each node types in the graph using this query. However, the Neo4j Browser gives a warning saying that this query produces a cartesian product. Is there a better way to write the query?
Yes. You want to make sure your query uses the NodeCountFromCountStore operator (you can view this in the query plan if you EXPLAIN the query, so you can check before you actually execute).
The tricky part of this is that the only way for this plan to be used is if you match to all nodes of a label, then get the count (no other variables in your WITH or RETURN!).
You can try this approach, which unions queries together, and keeps the NodeCountFromStore by adding the label column after you get the count:
match (n:Product)
with count(n) as count
return 'Product' as label, count
union all
match (n:Student)
with count(n) as count
return 'Student' as label, count
union all
match (n:Boy)
with count(n) as count
return 'Boy' as label, count
union all
match (n:Attribute)
with count(n) as count
return 'Attribute' as label, count
To get a variety of statistics for your DB, including a count of the number of nodes for every label, you can use the APOC function apoc.meta.stats.
The following query gets just the label node counts, returning a map of label names to node counts:
CALL apoc.meta.stats() YIELD labels
RETURN labels;
I am using a query like
MATCH p=((:Start)-[:NEXT*..100]->(n))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH DISTINCT n WHERE (n:RELEVANT)
...
RETURN n.someprop;
Where I want to have the results ordered by the natural ordering arising from the direction of the -[:NEXT]-> relationships.
But the WITH in the third line scrambles up that ordering. Problem is, I need the with to 1. filter for :RELEVANT nodes and 2. to get only distinct such nodes.
Is there some way to preserve the ordering? Maybe assign number ordering on the path and reuse it later with ORDER BY? No idea how to do it.
You're asking for distinct nodes, which indicates that the node might be reachable by multiple paths, and thus might be present at multiple distances from the start node.
Instead of using DISTINCT, you should use min() (or max(), depending on your requirements) on the path length for each n. Since those are aggregation functions, you will only ever get a single row for each n.
MATCH p=((:Start)-[:NEXT*..100]->(n:RELEVANT))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH n, min(length(p)) as distance
WITH n
ORDER BY distance
...
RETURN n.someprop;
And if you remove the WHERE clause from WITH and put the label :RELEVANT in the MATCH? Maybe the WHERE is causing the problem... Try something this:
MATCH p=((:Start)-[:NEXT*..100]->(n:RELEVANT))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH DISTINCT n
...
RETURN n.someprop;
I'm trying to find the number of nodes of a certain kind in my database that are connected to more than one other node of another kind. In my case, it's place nodes connected to several name nodes. I have a query that works:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p,count(n) as counts
WHERE counts > 1
RETURN p;`
However, that only returns the place nodes, and ideally I'd like it to return all the nodes and edges involved. I've found a question on returning variables from before the WITH, but if I include any of the other variables I've defined, the query returns no responses, i.e. this query returns nothing:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p, count(n) as counts, rels
WHERE counts > 1
RETURN p;
I don't know how to return the information that I want without changing the results of the query. Any help would be much appreciated
The reason your second query returns nothing is because its WITH clause specifies as aggregation "grouping keys" both p and rels. Since each rels path has only a single n value, counts would always be 1.
Something like this might work for you:
MATCH path=(p:Place)-[:Called]->(:Name)
WITH p, COLLECT(path) as paths
WHERE SIZE(paths) > 1
RETURN p, paths;
This returns each matching Place node and all its paths.
Try this:
MATCH (p:Place)-[c:Called]->(n:Name)
WHERE size((p)-[:Called]->(:Name)) > 1
WITH p,count(n) as counts, collect(n) AS names, collect(c) AS calls
RETURN p, names, calls, counts ORDER BY counts DESC;
This query makes use of Cypher's collect() function to create lists of the names and called relationships for each place that has more than Called relationship with a Name node.