How to find all labels that contain string in neo4j - neo4j

Trying to get all nodes of a certain label type. I have roots of multiple graphs that all have the same suffix in their labels. For example, i have 3 nodes that all have treeroot at the end of their label. So I might have companytreeroot, buildingtreeroot, nd employeetreeroot as 3 valid labels for 3 distinct nodes. How would I get all nodes whose label has that pattern?
I tried:
match (n) where '.*treeroot' in labels(n) return n
and
match (n) where 'treeroot' in labels(n) return n
but both return empty sets...

stdob--'s answer works, but it must inspect all labels of all nodes in your graph, so this becomes more and more expensive as your graph grows.
A faster approach involves first finding the labels that match quickly by using the db.labels() procedure, then (because Cypher does not natively support dynamic label queries) use APOC Procedures' cypher.run() procedure to use String concatenation to assemble a query that finds all nodes in all the labels that met your match.
Here's an example that should be quite fast, even on large graphs:
CALL db.labels() YIELD label
WITH label
WHERE label ENDS WITH 'treeroot'
CALL apoc.cypher.run('MATCH (n:' + label + ') return n', null) YIELD value
RETURN value.n as node

You can use ANY function for apply reqular expression to labels:
match (n) where ANY(l in labels(n) WHERE l =~ ".*treeroot")
return n

Related

Get the nodes with specific labels in neo4j

I want to get all the nodes that have specific labels. My code:
match (n) where labels(n)=["Person","Actor","Old"] return n
While there are nodes that satisfy this property I do not get any results.
This is why:
Labels(n) returns an array of strings, but not necessarily in a specific order.
You can try this:
WHERE n:Person AND n:Actor AND n:Old
Or
WHERE ALL(l in [“Person”, “Actor”, “Old”] WHERE l IN labels(n) )
List does not match if the items are not in the same order/sequence. So first of all, list out all labels in your database so you can see how the labels are arranged.
match (n)
return distinct labels(n)
Then you will see which node will have those labels that you look for: ["Person","Actor","Old"].
If you are trying to find nodes where it contains all nodes in that list in any order then this query will work for you.
match (n)
where all(lbl in ["Person","Actor","Old"] where lbl in labels(n))
return n
if you like using APOC functions, here is the query
match (n)
where apoc.coll.isEqualCollection(["Person","Actor","Old"], labels(n))
return n

Optimize Cypher query on multiple unique labels

In Neo4j, is it faster to run a query against all nodes (AllNodesScan) and then filter on their labels with a WHERE clause, or to run multiple queries with a NodeByLabelScan?
To illustrate, I want all nodes that are labeled with one of the labels in label_list:
label_list = ['label_1', 'label_2', ...]
Which would be faster in an application (this is pseudo-code):
for label in label_list:
run.query("MATCH (n:{label}) return n")
or
run.query("MATCH (n) WHERE (n:label_1 or n:label_2 or ...)")
EDIT:
Actually, I just realized that the best option might be to run multiple NodeByLabelScan in a single query, with something looking like this:
MATCH (a:label_1)
MATCH (b:label_2)
...
UNWIND [a, b ..] as foo
RETURN foo
Could someone speak to it?
Yes, it would be better to run multiple NodeByLabelScans in a single query.
For example:
OPTIONAL MATCH (a:label_1)
WITH COLLECT(a) AS list
OPTIONAL MATCH (b:label_2)
WITH list + COLLECT(b) AS list
OPTIONAL MATCH (c:label_3)
WITH list + COLLECT(c) AS list
UNWIND list AS n
RETURN DISTINCT n
Notes on the query:
It uses OPTIONAL MATCH so that the query can proceed even if a wanted label is not found in the DB.
It uses multiple aggregation steps to avoid cartesian products (also see this).
And it uses UNWIND so that it can useDISTINCT to return distinct nodes (since a node can have multiple labels).

Cypher query (or APOC procedure) that samples all labels and returns graph representing x% of nodes

I am working with a graph that has many object types (e.g. LABELS).
I would like to be able to run a query that samples every label, and returns a small but representative set of data containing nodes (and relationships) for each label. Has anyone seen or achieved this?
Thanks, John
This returns for each label, five nodes associated with this label :
call db.labels() yield label
call apoc.cypher.run("match (x:`"+label+"`) RETURN x LIMIT 5", null) yield value
return label, collect(value.x) AS nodes
Without knowing your model you could display your complete label structure as graph by the Cypher statement CALL apoc.meta.graph();.
For a representative set of data for each label we should know your underlying model or rather labels. I could imagine a solution based on the Limit clause:
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
WITH n, r
LIMIT 5000
RETURN n, r;

Neo4j indices slow when querying across 2 labels

I've got a graph where each node has label either A or B, and an index on the id property for each label:
CREATE INDEX ON :A(id);
CREATE INDEX ON :B(id);
In this graph, I want to find the node(s) with id "42", but I don't know a-priori the label. To do this I am executing the following query:
MATCH (n {id:"42"}) WHERE (n:A OR n:B) RETURN n;
But this query takes 6 seconds to complete. However, doing either of:
MATCH (n:A {id:"42"}) RETURN n;
MATCH (n:B {id:"42"}) RETURN n;
Takes only ~10ms.
Am I not formulating my query correctly? What is the right way to formulate it so that it takes advantage of the installed indices?
Here is one way to use both indices. result will be a collection of matching nodes.
OPTIONAL MATCH (a:B {id:"42"})
OPTIONAL MATCH (b:A {id:"42"})
RETURN
(CASE WHEN a IS NULL THEN [] ELSE [a] END) +
(CASE WHEN b IS NULL THEN [] ELSE [b] END)
AS result;
You should use PROFILE to verify that the execution plan for your neo4j environment uses the NodeIndexSeek operation for both OPTIONAL MATCH clauses. If not, you can use the USING INDEX clause to give a hint to Cypher.
You should use UNION to make sure that both indexes are used. In your question you almost had the answer.
MATCH (n:A {id:"42"}) RETURN n
UNION
MATCH (n:B {id:"42"}) RETURN n
;
This will work. To check your query use profile or explain before your query statement to check if the indexes are used .
Indexes are formed and and used via a node label and property, and to use them you need to form your query the same way. That means queries w/out a label will scan all nodes with the results you got.

neo4j cypher, performance on specifying labels when id(n) is in conditions

has anyone tested/knows if - when querying a Neo4j database with Cypher - specifying the
MATCH node:labels
makes the selection faster even if
WHERE id(node) = x
is in place?
MATCH (n)
WHERE ID(n) = {x}
RETURN n
should be negligibly faster than
MATCH (n:MyLabel)
WHERE ID(n) = {x}
RETURN n
Both queries first get the node by internal id but while the first query returns the second filters the result on hasLabel(n:MyLabel).
It's a good example of where its possible to overuse labels. Similarly, if for the pattern (a:Person {name:"Étienne Gilson"})-[:FRIEND]->(b:Person)-[:FRIEND]->(c:Person) I know that only :Person nodes have -[:FRIEND]- relationships, there is no point in filtering b and c on that label. If a remote node should be retrieved from index then the label should be included to indicate that, i.e. -[:FRIEND]->(b:Person {name:"Jacques Maritain"}), but when no (indexed) property is included in that part of the pattern, the nodes will be reached by traversal and if only people have friends an additional filter on hasLabel(b:Person) would be pointless.
I understand this blog post to mean that filtering on a label is as cheap as a bitwise &, so performance difference should be very small.

Resources