I have a huge (non cyclic) graph and want to find all nodes reachable by relation X from a given node. However, I do not want to cross a node having a certain attribute {attr:'donotcross'} as this represents a choke point I do not want to cross (i.e. this is the only node leading to an adjacent subgraph).
Currently I do breadth first search myself using a trivial Cypher query to isolate neighboring nodes and python, stopping the recursion as soon as I reach that specific node. This, however, is really slow and I think that using pure Cypher to isolate those nodes could be faster.
What does the Cypher query look like returning all connected nodes via X but not traversing a node with property attr:'donotcross'?
My intuition would be
MATCH (n)-[:X*]->(inter)-[:X*]->(m) WHERE NOT inter.attr = 'donotcross' RETURN m
With n being the start node. However, this does not work as this pattern can match a path with a forbidden node if there are more than the forbidden node in between the start and target node.
Using Cypher alone, you can use the following approach:
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE inter.attr = 'donotcross')
RETURN DISTINCT m
Keep in mind you should at least be using labels for your starting node n, if you aren't able to look them up by an indexed property for a specific label.
Also, if there are relatively few of these donotcross nodes, and if there is an index on the label of these nodes on attr, then it may be faster to first match on these nodes, collect them, then filter based on that:
MATCH (x) // best to use a label and index lookup!
WHERE x.attr = 'donotcross'
WITH collect(x) as excluded
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE node in excluded)
RETURN DISTINCT m
Related
So I am trying to find all the nodes between a node of my choosing and a property, called Stop:true. Note: I would like for it to include the stop node.
So If I have a set of nodes like this:
(id:1,Stop:false)-(id:2,Stop:false)-(id:3,Stop:false)-(id:4,Stop:false)-
(id:5,Stop:false)-(id:6,Stop:True)-(id:7,Stop:false)-(id:8,Stop:false)
It would return
(id:1,Stop:false)-(id:2,Stop:false)-(id:3,Stop:false)-(id:4,Stop:false)-
(id:5,Stop:false)-(id:6,Stop:True)
So far I have
MATCH p=(a:Node{id:1})-[*]-(b:Node)
WHERE NOT b.Stop = true
RETURN p
But this query still returns nodes that are connected to the stop node. How do I make it show ALL the nodes up to the stop node?
The following query should return every path from your chosen node to a Stop node (i.e.., a node that has a Stop value of true). If your DB has paths with multiple Stop nodes, then this query would return a path to each Stop node (meaning that some of the returned paths could contain multiple Stop nodes).
MATCH p=(a:Node{id:6})-[*]-(b:Node {Stop: true})
RETURN p;
However, if you only want paths that have a single Stop node (at the end), then this query should work:
MATCH p=(a:Node{id:6})-[*]-(b:Node {Stop: true})
WHERE NONE(n IN NODES(p)[1..-1] WHERE n.Stop)
RETURN p;
[NOTE]
Variable-length path patterns (like ()-[*]-()) have exponential time and space complexity. If the average degree of a node is X, then traversing a variable-length path to depth Y imposes a complexity of O(X^Y). You would normally need to specify a reasonable upper bound on variable-length patterns (e.g., ()-[*..5]-()) to avoid running out of memory or having the query take seemingly forever to run. The upper bound you specify would depend on the nature of your query and your actual data characteristics.
Take the above image as an example. Using Cypher, how would I match all of the nodes except for the longest chain and the central node? I.e. all nodes within exactly one hop of the central node whilst excluding the central node (all nodes and edges except 3 nodes and 2 edges).
I have tried the following:
MATCH (n:Node) WHERE n.id = "123" MATCH path = (m)-[*1..1]->(n) RETURN m
This very nearly works, however it still returns the central node (i.e. node n). How would I exclude this node from my query result?
[UPDATED]
This will return all distinct nodes directly connected to the specified node, and explicitly prevents the specified node from being returned (in case it has a relationship to itself):
MATCH (n:Node)--(m)
WHERE n.id = "123" AND n <> m
RETURN DISTINCT m;
Ideally I would have liked to match the nodes as mentioned in my question and delete them. However, as I have not found a way to do so an inverse approach can be utilised whereby all nodes but those as mentioned in the question are matched instead. Thereby effectively excluding (but not deleting) the unwanted nodes.
This can be achieved using this query:
MATCH (n:Node) WHERE n.id = "123" MATCH path = (m)-[*2..]->(n) RETURN path
This returns the central node and all paths to that node that have a "length" greater than or equal to 2.
In a graph where the following nodes
A,B,C,D
have a relationship with each nodes successor
(A->B)
and
(B->C)
etc.
How do i make a query that starts with A and gives me all nodes (and relationships) from that and outwards.
I do not know the end node (C).
All i know is to start from A, and traverse the whole connected graph (with conditions on relationship and node type)
I think, you need to use this pattern:
(n)-[*]->(m) - variable length path of any number of relationships from n to m. (see Refcard)
A sample query would be:
MATCH path = (a:A)-[*]->()
RETURN path
Have also a look at the path functions in the refcard to expand your cypher query (I don't know what exact conditions you'll need to apply).
To get all the nodes / relationships starting at a node:
MATCH (a:A {id: "id"})-[r*]-(b)
RETURN a, r, b
This will return all the graphs originating with node A / Label A where id = "id".
One caveat - if this graph is large the query will take a long time to run.
What I'm trying to do is simply start at a node and search for all connected nodes that are a certain label. However I don't want to return the start node. How would I do this?
Example:
...<-[:parent]<-anode<-[created]-user-[created]->anode-[:parent]->anode-....->nodes...
What I would like to do is start at the user node and return all relationships but excluding the user node.
This will return you a list of all nodes connected via created relationships of a distance of up to 10.
MATCH user-[:created*1..10]->(anode:CertainLabel)
RETURN DISTINCT anode
Depending on your graph, you may be able to get rid of the 10, but if it's large and complex removing the max value could cause your query to run very slowly
This is along the lines of what I was looking for.
START u = node(26)
MATCH (u)-[rels*1..10]->(node) unwind rels as r
RETURN DISTINCT id(startNode(r)),endNode(r)
I'm struggling to find a single clean, efficient Cypher query that will let me identify all distinct paths emanating from a start node such that every relationship in the path is of the same type when there are many relationship types.
Here's a simple version of the model:
CREATE (a), (b), (c), (d), (e), (f), (g),
(a)-[:X]->(b)-[:X]->(c)-[:X]->(d)-[:X]->(e),
(a)-[:Y]->(c)-[:Y]->(f)-[:Y]->(g)
In this model (a) has two outgoing relationship types, X and Y. I would like to retrieve all the paths that link nodes along relationship X as well as all the paths that link nodes along relationship Y.
I can do this programmatically outside of cypher by making a series of queries, the first to
retrieve the list of outgoing relationships from the start node, and then a single query (submitted together as a batch) for each relationship. That looks like:
START n=node(1)
MATCH n-[r]->()
RETURN COLLECT(DISTINCT TYPE(r)) as rels;
followed by:
START n=node(1)
MATCH n-[:`reltype_param`*]->()
RETURN p as path;
The above satisfies my need, but requires at minimum 2 round trips to the server (again, assuming I batch together the second set of queries in one transaction).
A single-query approach that works, but is horribly inefficient is the following single Cypher query:
START n=node(1)
MATCH p = n-[r*]->() WHERE
ALL (x in RELATIONSHIPS(p) WHERE TYPE(x) = TYPE(HEAD(RELATIONSHIPS(p))))
RETURN p as path;
That query uses the ALL predicate to filter the relationships along the paths enforcing that each relationship in the path matches the first relationship in the path. This, however, is really just a filter operation on what it essentially a combinatorial explosion of all possible paths --- much less efficient than traversing a relationship of a known, given type first.
I feel like this should be possible with a single Cypher query, but I have not been able to get it right.
Here's a minor optimization, at least non-matching the paths will fail fast:
MATCH n-[r]->()
WITH distinct type(r) AS t
MATCH p = n-[r*]->()
WHERE type(r[-1]) = t // last entry matches
RETURN p AS path
This is probably one of those things that should be in the Java API if you want it to be really performant, though.