I am trying to write a query in Cypher that returns all leaf nodes given a specific root node.
Right now I have been using:
MATCH (root:Node {name: 'Name'})<-[:REL *]-(leaf:Node)
WHERE NOT (leaf)<-[:REL]-()
RETURN leaf
The problem with this query is that as the database becomes larger, it becomes exponentially slower because every single possible leaf node that connects to my root is checked in the not clause. To omit the not clause, I can return the entire path like this:
MATCH p=(root:Node {name: 'Name'})<-[:REL *]-(leaf:Node)
RETURN p
The second query is a lot faster as the number of nodes/relationships in the graph increases, but I would prefer to return just the leaf nodes instead of the path.
Is there a way to run this query more efficiently on a larger data set?
have you tried this?
MATCH (root:Node {name: 'Name'})<-[:REL *]-(leaf:Node)
WITH leaf
WHERE NOT (leaf)<-[:REL]-()
RETURN leaf
On a similar case for me it gives the lowest number of db hits.
Related
[Edit] I'm using Neo4j 4.2.1
I have this need for a Cypher query that brings back a complete tree given its root node. All nodes and relationships must be fetched and present only once in the returned sets. Here's what I have come to:
MATCH p = (n)-[*..]->(m)
WHERE id(n) = 0
WITH relationships(p) AS r
WITH distinct last(r) as rel
WITH [node IN [startNode(rel), endNode(rel)] | node] AS tmp, rel
UNWIND tmp AS node
RETURN collect(DISTINCT node) AS nodes, collect(distinct rel) AS relationships;
Running the query on our database to get about 820 nodes makes the thing crash for lack of memory (5Gb allowed). Hard to believe. So I'm wondering : Is this query ill-born? Is there one technique I'm using that shouldn't be used for my purpose?
I strongly recommend that you come up with a node property that is guaranteed to be the same on all the nodes in a contiguous tree, if you don't have one already. I'll call that property same_prop. Here's what I do to run queries like the one you're running:
Index same_prop. If you have different node labels, then you need this index created for each different node label you expect to have in the tree.
CREATE INDEX samepropnode FOR (n:your_label) ON (n.same_prop)
is the kind of thing you need in Neo4j 4+. In Neo4j, indices are cheap, and can sometimes speed up queries quite a bit.
Collect all possible values of same_prop and store them in a text file (I use tab-separated values as safer than comma-separated values).
Use the Python driver, or your language of choice that has a Neo4j driver written (strongly recommend Neo4j-provided drivers, not third-party) to write wrapper code that executes a Cypher query something like this:
MATCH (p)-->(c)
USING INDEX p:your_label(same_prop)
WHERE p.same_prop IN [ same_prop_list ]
RETURN DISTINCT
p.datapiece1 AS `first_parent_datapiece`,
p.datapiecen AS `nth_parent_datapiece`,
c.datapiece1 AS `first_child_datapiece`,
c.datapiecen AS `nth_child_datapiece`
It's not a good idea, in general, to return nodes and relationships unless you're debugging.
Then in your Python (for example) code, you're simply going to read in all your same_prop values from the file you got in Step 2, chunk up the values in reasonable size chunks, maybe 1,000 or 10,000, and substitute them in for the [ same_prop_list ] in the Cypher query on-the-fly.
I have a huge (non cyclic) graph and want to find all nodes reachable by relation X from a given node. However, I do not want to cross a node having a certain attribute {attr:'donotcross'} as this represents a choke point I do not want to cross (i.e. this is the only node leading to an adjacent subgraph).
Currently I do breadth first search myself using a trivial Cypher query to isolate neighboring nodes and python, stopping the recursion as soon as I reach that specific node. This, however, is really slow and I think that using pure Cypher to isolate those nodes could be faster.
What does the Cypher query look like returning all connected nodes via X but not traversing a node with property attr:'donotcross'?
My intuition would be
MATCH (n)-[:X*]->(inter)-[:X*]->(m) WHERE NOT inter.attr = 'donotcross' RETURN m
With n being the start node. However, this does not work as this pattern can match a path with a forbidden node if there are more than the forbidden node in between the start and target node.
Using Cypher alone, you can use the following approach:
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE inter.attr = 'donotcross')
RETURN DISTINCT m
Keep in mind you should at least be using labels for your starting node n, if you aren't able to look them up by an indexed property for a specific label.
Also, if there are relatively few of these donotcross nodes, and if there is an index on the label of these nodes on attr, then it may be faster to first match on these nodes, collect them, then filter based on that:
MATCH (x) // best to use a label and index lookup!
WHERE x.attr = 'donotcross'
WITH collect(x) as excluded
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE node in excluded)
RETURN DISTINCT m
In a graph where the following nodes
A,B,C,D
have a relationship with each nodes successor
(A->B)
and
(B->C)
etc.
How do i make a query that starts with A and gives me all nodes (and relationships) from that and outwards.
I do not know the end node (C).
All i know is to start from A, and traverse the whole connected graph (with conditions on relationship and node type)
I think, you need to use this pattern:
(n)-[*]->(m) - variable length path of any number of relationships from n to m. (see Refcard)
A sample query would be:
MATCH path = (a:A)-[*]->()
RETURN path
Have also a look at the path functions in the refcard to expand your cypher query (I don't know what exact conditions you'll need to apply).
To get all the nodes / relationships starting at a node:
MATCH (a:A {id: "id"})-[r*]-(b)
RETURN a, r, b
This will return all the graphs originating with node A / Label A where id = "id".
One caveat - if this graph is large the query will take a long time to run.
My graph is a tree structure with root and end nodes, and a line of nodes between them with [:NEXT]-> relationships from one to the next. Some nodes along that path also have [:BRANCH]-> relationships to other root nodes, and through them to other lines of nodes.
What Cypher query will return an ordered list of the nodes on the path from beginning to end, with any BRANCH relationships being included with the records for the nodes that have them?
EDIT: It's not a technical diagram, but the basic structure looks like this:
with each node depicted as a black circle. In this case, I would would want every node depicted here.
How about
MATCH p=(root)-[:NEXT*0..]->(leaf)
OPTIONAL MATCH (leaf)-[:BRANCH]->(branched)
RETURN leaf, branched, length(p) as l
ORDER BY l ASC
see also this graph-gist: http://gist.neo4j.org/?9042990
This query - a bit slow - should work (I guess):
START n=node(startID), child=node(*)
MATCH (n)-[rels*]-(child)
WHERE all(r in rels WHERE type(r) IN ["NEXT", "BRANCH"])
RETURN *
That is based on Neo4j 2.0.x Cypher syntax.
Technically this query will stop at the end of the tree started from startID: that is because the end in the diagram above belongs to a single path, but not the end of all the branches.
I would also recommend to limit the cardinality of the relationships - [rels*1..n] - to prevent the query to go away...
You wont be able to control the order in which the nodes are returned as per the depth first or breadth first algo unless you have a variable to save previous element or kind of recursive call which I dont think is not possible using only Cypher.
What you can do
MATCH p =(n)-[:NEXT*]->(end)
WITH collect(p) as node_paths
MATCH (n1)-[:NEXT]->(m)-[:BRANCH]->(n2)
WITH collect(m) as branch_nodes , node_paths
RETURN branch_nodes,node_paths
Now node_paths consists of all the paths with pattern (node)-[:NEXT]->(node)-[:NEXT]->...(node) . Now you have the paths and branch Nodes(starting point of basically all the paths in the node_paths except the one which will be emerging from root node) , you can arrange the output order accordingly.
I'm struggling to find a single clean, efficient Cypher query that will let me identify all distinct paths emanating from a start node such that every relationship in the path is of the same type when there are many relationship types.
Here's a simple version of the model:
CREATE (a), (b), (c), (d), (e), (f), (g),
(a)-[:X]->(b)-[:X]->(c)-[:X]->(d)-[:X]->(e),
(a)-[:Y]->(c)-[:Y]->(f)-[:Y]->(g)
In this model (a) has two outgoing relationship types, X and Y. I would like to retrieve all the paths that link nodes along relationship X as well as all the paths that link nodes along relationship Y.
I can do this programmatically outside of cypher by making a series of queries, the first to
retrieve the list of outgoing relationships from the start node, and then a single query (submitted together as a batch) for each relationship. That looks like:
START n=node(1)
MATCH n-[r]->()
RETURN COLLECT(DISTINCT TYPE(r)) as rels;
followed by:
START n=node(1)
MATCH n-[:`reltype_param`*]->()
RETURN p as path;
The above satisfies my need, but requires at minimum 2 round trips to the server (again, assuming I batch together the second set of queries in one transaction).
A single-query approach that works, but is horribly inefficient is the following single Cypher query:
START n=node(1)
MATCH p = n-[r*]->() WHERE
ALL (x in RELATIONSHIPS(p) WHERE TYPE(x) = TYPE(HEAD(RELATIONSHIPS(p))))
RETURN p as path;
That query uses the ALL predicate to filter the relationships along the paths enforcing that each relationship in the path matches the first relationship in the path. This, however, is really just a filter operation on what it essentially a combinatorial explosion of all possible paths --- much less efficient than traversing a relationship of a known, given type first.
I feel like this should be possible with a single Cypher query, but I have not been able to get it right.
Here's a minor optimization, at least non-matching the paths will fail fast:
MATCH n-[r]->()
WITH distinct type(r) AS t
MATCH p = n-[r*]->()
WHERE type(r[-1]) = t // last entry matches
RETURN p AS path
This is probably one of those things that should be in the Java API if you want it to be really performant, though.