Optimize Cypher query on multiple unique labels - neo4j

In Neo4j, is it faster to run a query against all nodes (AllNodesScan) and then filter on their labels with a WHERE clause, or to run multiple queries with a NodeByLabelScan?
To illustrate, I want all nodes that are labeled with one of the labels in label_list:
label_list = ['label_1', 'label_2', ...]
Which would be faster in an application (this is pseudo-code):
for label in label_list:
run.query("MATCH (n:{label}) return n")
or
run.query("MATCH (n) WHERE (n:label_1 or n:label_2 or ...)")
EDIT:
Actually, I just realized that the best option might be to run multiple NodeByLabelScan in a single query, with something looking like this:
MATCH (a:label_1)
MATCH (b:label_2)
...
UNWIND [a, b ..] as foo
RETURN foo
Could someone speak to it?

Yes, it would be better to run multiple NodeByLabelScans in a single query.
For example:
OPTIONAL MATCH (a:label_1)
WITH COLLECT(a) AS list
OPTIONAL MATCH (b:label_2)
WITH list + COLLECT(b) AS list
OPTIONAL MATCH (c:label_3)
WITH list + COLLECT(c) AS list
UNWIND list AS n
RETURN DISTINCT n
Notes on the query:
It uses OPTIONAL MATCH so that the query can proceed even if a wanted label is not found in the DB.
It uses multiple aggregation steps to avoid cartesian products (also see this).
And it uses UNWIND so that it can useDISTINCT to return distinct nodes (since a node can have multiple labels).

Related

How to find all nodes between two nodes in graph database?

I`m using Cypher query language and I need to find nodes between node A and E. (A->B->C->D->E)
Next query returns all nodes including A and E, but i need to exclude them, to have B, C, D nodes. How can I filter my query result?
MATCH p= (A:City{name: 'City1'})-[:LINKED*]->(E:City{name: "City5"}) return nodes(p)
There are a number of ways you might do this, but a simple option is to just index into the node list using:
return nodes(p)[1..-1]

Neo4j count Query

match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.

neo4j cypher keep ordering imposed by path for later in the query

I am using a query like
MATCH p=((:Start)-[:NEXT*..100]->(n))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH DISTINCT n WHERE (n:RELEVANT)
...
RETURN n.someprop;
Where I want to have the results ordered by the natural ordering arising from the direction of the -[:NEXT]-> relationships.
But the WITH in the third line scrambles up that ordering. Problem is, I need the with to 1. filter for :RELEVANT nodes and 2. to get only distinct such nodes.
Is there some way to preserve the ordering? Maybe assign number ordering on the path and reuse it later with ORDER BY? No idea how to do it.
You're asking for distinct nodes, which indicates that the node might be reachable by multiple paths, and thus might be present at multiple distances from the start node.
Instead of using DISTINCT, you should use min() (or max(), depending on your requirements) on the path length for each n. Since those are aggregation functions, you will only ever get a single row for each n.
MATCH p=((:Start)-[:NEXT*..100]->(n:RELEVANT))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH n, min(length(p)) as distance
WITH n
ORDER BY distance
...
RETURN n.someprop;
And if you remove the WHERE clause from WITH and put the label :RELEVANT in the MATCH? Maybe the WHERE is causing the problem... Try something this:
MATCH p=((:Start)-[:NEXT*..100]->(n:RELEVANT))
WHERE ALL(node IN nodes(p) WHERE ...)
WITH DISTINCT n
...
RETURN n.someprop;

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

Background
Hi all, I am currently trying to write a cypher statement that allows me to find a set of paths on a map from a starting point. I want my search result to always return connecting streets within 5 nodes. Optionally, if there's a nearby hospital, I would like my search pattern to also indicate nearby hospitals.
Main Problem
Because there isn't always a nearby hospital to the current street, sometimes my optional match search pattern comes back as null. Here's the current cypher statement I'm using:
MATCH path=(a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WHERE ALL (x IN nodes(path) WHERE (x:Street))
WITH DISTINCT nodes(path) + nodes(optionalPath) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
However, this syntax only works if optionalPath contains nodes. If it doesn't, the statement nodes(path) + nodes(optionalPath) is an operation adding null and I get no records. This is true even the nodes(path) term does contain nodes.
What's the best way to get around this problem?
You can use COALESCE to replace a NULL with some other value. For example:
MATCH path=(:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
WHERE ALL (x IN nodes(path) WHERE x:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodes(path) + COALESCE(nodes(optionalPath), []) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
I have also made a few other improvements:
The WHERE clause was moved up right after the first MATCH. This eliminates the unwanted path values immediately. Your original query would get all path values (even unwanted ones) and always the perform the second MATCH query, and only eliminate unwanted paths afterwards. (But, it is actually not clear if you even need the WHERE clause at all; for example, if the CONNECTED_TO relationship is only used between Street nodes.)
The DISTINCT in your WITH clause would have prevented duplicate n collections, but the collections internally could have had duplicate paths. This was probably not what you wanted.
It seems you don't really want the path, just all the street nodes within 5 steps, plus any connected hospitals. So I would simplify your query to just that, and then condense the 3 columns down to 1.
MATCH (a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH collect(a) + collect(b) + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
If Streets can be indirectly connected (hospital in between), Than I'd adjust like this
MATCH (a:Street {id: 123})-[:CONNECTED_TO]-(b:Street)
WITH a as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodez + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
It's a bit more verbose, but just says exactly what you want (and also adds the start node to the hospital check list)

Including vars in Neo4j WITH statement changes query output

I'm trying to find the number of nodes of a certain kind in my database that are connected to more than one other node of another kind. In my case, it's place nodes connected to several name nodes. I have a query that works:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p,count(n) as counts
WHERE counts > 1
RETURN p;`
However, that only returns the place nodes, and ideally I'd like it to return all the nodes and edges involved. I've found a question on returning variables from before the WITH, but if I include any of the other variables I've defined, the query returns no responses, i.e. this query returns nothing:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p, count(n) as counts, rels
WHERE counts > 1
RETURN p;
I don't know how to return the information that I want without changing the results of the query. Any help would be much appreciated
The reason your second query returns nothing is because its WITH clause specifies as aggregation "grouping keys" both p and rels. Since each rels path has only a single n value, counts would always be 1.
Something like this might work for you:
MATCH path=(p:Place)-[:Called]->(:Name)
WITH p, COLLECT(path) as paths
WHERE SIZE(paths) > 1
RETURN p, paths;
This returns each matching Place node and all its paths.
Try this:
MATCH (p:Place)-[c:Called]->(n:Name)
WHERE size((p)-[:Called]->(:Name)) > 1
WITH p,count(n) as counts, collect(n) AS names, collect(c) AS calls
RETURN p, names, calls, counts ORDER BY counts DESC;
This query makes use of Cypher's collect() function to create lists of the names and called relationships for each place that has more than Called relationship with a Name node.

Resources