neo4j query to exclude nodes related to nodes with certain properties - neo4j

I am trying to write a neo4j query where I only want to present nodes that are have no relation to nodes with a specific property. One way to think of it is where two separate graphs exist where one node has the property I want to exclude. I should get a result that only contains the graph of the set of nodes not connected to the node that has the property I want to exclude. This is what the graph looks like before my query
match (n) where not (n{property:'valueIWishToExclude'})--() return n
This is what the result of the query looks like
I only want to have the four connected nodes in my results. How can I set up a query that excludes the nodes that are not connected to the node with the property I wish to exclude?

In fact you need those nodes from which there is no path to the node that should be excluded. You can use the shortestPath function and ALL predicate:
match (ex) where n.property = 'valueIWishToExclude'
with collect(ex) as exn
match (n) where (not n.property = 'valueIWishToExclude') and
ALL(e in exn where not shortestPath( (n)-[*]-(e) ) is null)
return n

You are almost there, just add in the relationship in your query to only get the nodes that are related to each other
MATCH (n:label) -[:RELATED]->() where n.property<>'exclude'
RETURN n
That should return only the nodes connected to each other, as the other nodes do not have that relationship.
Let me know if that worked for you.

You may want to alter your wording a bit, what you're asking for in this question, and what you really want, are not the same thing.
In Neo4j (and most graph databases), the phrase "nodes that have no relation to..." means nodes that are not connected by a relationship to the node in question.
In that context, in your right graph (assuming the one node selected is the node marked as excluded), one node would fit the criteria and be returned as a possible result, the topmost node, since it doesn't have a relationship to the node you want to exclude; It is however two relationships removed from the excluded node.
You seem to be asking for something else, though. You seem to want nodes that are not in the same subgraph as the node to exclude. Or, alternately, nodes that have no path to the excluded node.
Make sure on future queries you're clear about what you're asking, or you'll get answers that have no relevance to what you really want.
One approach that will work is to first find all nodes within the subgraph of the excluded node, and then return all nodes that are not in those subgraph nodes.
You'll want to install APOC Procedures so you can make use of a fast means of obtaining nodes within the subgraph.
You'll also want to use labels in your graph, and maybe put an index on the property you're searching for as this will make your search fast. As it is now, your query must examine every node in your entire database to find nodes with the property in question, and that will become slower and slower as your graph grows.
Your query might look like this (using 'Label' as a stand-in for the node label):
MATCH (n:Label{propertyToExclude:'valueToExclude'})
CALL apoc.path.expandConfig(n, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH COLLECT(DISTINCT LAST(NODES(path))) as subgraph
MATCH (n)
WHERE NOT n in subgraph
RETURN n

Related

How to get all connected nodes in neo4j

I want to get list of all connected nodes starting from node 0 as shown in the diagram
Based on your comment:
I want to get a list of all the connected nodes. For example in the
above case when I search for connected nodes for 0, it should return
nodes- 1,2,3
This query will do what you want:
MATCH ({id : 0})-[*]-(connected)
RETURN connected
The above query will return all nodes connected with a node with id=0 (I'm considering that the numbers inside the nodes are values of an id property) in any depth, both directions and considering any relationship type. Take a look in the section Relationships in depth of the docs.
While this will work fine for small graphs note that this is a very expensive operation. It will go through the entire graph starting from the start point ({id : 0}) considering any relationship type. This is really not a good idea for production environments.
If you wish to match the nodes that have a relationship to another node, you can use this:
MATCH (n) MATCH (n)-[r]-() RETURN n,r
It will return you all the nodes that have a relationship to another node or nodes, irrespective of the direction of the relationship.
If you wish to add a constraint you can do it this way:
MATCH (n:Label {id:"id"}) MATCH (n)-[r]-() RETURN n,r
For larger or more heavily interconnected graphs, APOC Procedures offers a more efficient means of traversal that returns all nodes in a subgraph.
As others have already mentioned, it's best to use labels on your nodes, and add either an index or a unique constraint on the label+property for fast lookup of your starting node.
Using a label of "Label", and a parameter of idParam, a query to get nodes of the subgraph with APOC would be:
MATCH (n:Label {id:$idParam})
CALL apoc.path.subgraphNodes(n, {minLevel:1}) YIELD node
RETURN node
Nodes will be distinct, and the starting node will not be returned with the rest.
EDIT
There's currently a restriction preventing usage of minLevel in subgraphNodes(), you can use either filter out the starting node yourself, or use apoc.path.expandConfig() using uniqueness:'NODE_GLOBAL' to get the same effect.

Neo4j Cypher: Match and Delete the subgraph based on value of node property

Suppose I have 3 subgraphs in Neo4j and I would like to select and delete the whole subgraph if all the nodes in the subgraph matching the filtering criteria that is each node's property value <= 1. However if there is atleast one node within the subgraph that is not matching the criteria then the subgraph will not be deleted.
In this case the left subgraph will be deleted but the right subgraph and the middle one will stay. The right one will not be deleted even though it has some nodes with value 1 because there are also nodes with values greater than 1.
userids and values are the node properties.
I will be thankful if anyone can suggest me the cypher query that can be used to do that. Please note that the query will be on the whole graph, that is on all three subgraphs or more if there are anymore.
Thanks for the clarification, that's a tricky requirement, and it's not immediately clear to me what the best approach is that will scale well with large graphs, as most possibilities seem to be expensive full graph operations. We'll likely need to use a few steps to set up the graph for easier querying later. I'm also assuming you mean "disconnected subgraphs", otherwise this answer won't work.
One start might be to label nodes as :Alive or :Dead based upon the property value. It should help if all nodes are of the same label, and if there's an index on the value property for that label, as our match operations could take advantage of the index instead of having to do a full label scan and property comparison.
MATCH (a:MyNode)
WHERE a.value <= 1
SET a:Dead
And separately
MATCH (a:MyNode)
WHERE a.value > 1
SET a:Alive
Then your query to mark nodes to delete would be:
MATCH (a:Dead)
WHERE NOT (a)-[*]-(:Alive)
SET a:ToDelete
And if all looks good with the nodes you've marked for delete, you can run your delete operation, using apoc.periodic.commit() from APOC Procedures to batch the operation if necessary.
MATCH (a:ToDelete)
DETACH DELETE a
If operations on disconnected subgraphs are going to be common, I highly encourage using a special node connected to each subgraph you create (such as a single :Cluster node at the head of the subgraph) so you can begin such operations on :Cluster nodes, which would greatly speed up these kind of queries, since your query operations would be executed per cluster, instead of per :Dead node.

How to filter results by node label in neo4j cypher?

I have a graph database that maps out connections between buildings and bus stations, where the graph contains other connecting pieces like roads and intersections (among many node types).
What I'm trying to figure out is how to filter a path down to only return specific node types. I have two related questions that I'm currently struggling with.
Question 1: How do I return the labels of nodes along a path?
It seems like a logical first step is to determine what type of nodes occur along the path.
I have tried the following:
MATCH p=(a:Building)­-[:CONNECTED_TO*..5]­-(b:Bus)
WITH nodes(p) AS nodes
RETURN DISTINCT labels(nodes);
However, I'm getting a type exception error that labels() expects data of type node and not Collection. I'd like to dynamically know what types of nodes are on my paths so that I can eventually filter my paths.
Question 2: How can I return a subset of the nodes in a path that match a label I identified in the first step?
Say I found that that between (a:Building) and (d1:Bus) and (d2:Bus) I can expect to find (:Intersection) nodes and (:Street) nodes.
This is a simplified model of my graph:
(a:Building)­­--(:Street)­--­(:Street)--­­(b1:Bus)
\­­(:Street)--­­(:Intersection)­­--(:Street)--­­(b2:Bus)
I've written a MATCH statement that would look for all possible paths between (:Building) and (:Bus) nodes. What would I need to do next to filter to selectively return the Street nodes?
MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
// Insert logic to only return (:Street) nodes from p
Any guidance on this would be greatly appreciated!
To get the distinct labels along matching paths:
MATCH p=(a:Building)-[:CONNECTED_TO*..5]-(b:Bus)
WITH NODES(p) AS nodes
UNWIND nodes AS n
WITH LABELS(n) AS ls
UNWIND ls AS label
RETURN DISTINCT label;
To return the nodes that have the Street label.
MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
WITH NODES(p) AS nodes
UNWIND nodes AS n
WITH n
WHERE 'Street' IN LABELS(n)
RETURN n;
Cybersam's answers are good, but their output is simply a column of labels...you lose the path information completely. So if there are multiple paths from a :Building to a :Bus, the first query will only output all labels in all nodes in all patterns, and you can't tell how many paths exist, and since you lose path information, you cannot tell what labels are in some paths but not others, or common between some paths.
Likewise, the second query loses path information, so if there are multiple paths using different streets to get from a :Building to a :Bus, cybersam's query will return all streets involved in all paths. It is possible for it to output all streets in your graph, which doesn't seem very useful.
You need queries that preserve path information.
For 1, finding the distinct labels on nodes on each path I would offer this query:
MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH REDUCE(myLabels = [], node in nodes | myLabels + labels(node)) as myLabels
RETURN DISTINCT myLabels
For 2, this query preserves path information:
MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH FILTER(node in nodes WHERE (node:Street)) as pathStreets
RETURN pathStreets
Note that these are both expensive operations, as they perform a cartesian product of all buildings and all busses, as in the queries in your description. I highly recommend narrowing down the buildings and busses you're matching upon, hopefully to very few or specific buildings at least.
I also encourage limiting how deep you're looking in your pattern. I get the idea that many, if not most, of your nodes in your graph are connected by :CONNECTED_TO relationships, and if we don't cap that to a reasonable amount, your query could be finding every single path through your entire graph, no matter how long or convoluted or nonsensical, and I don't think that's what you want.

How to find distinct nodes in a Neo4j/Cypher query

I'm trying to do some pattern matching in neo4j/cypher and I came across this issue:
There are two types of graphs I want to search for:
Star graphs: A graph with one center node and multiple outgoing relationships.
n-length line graphs: A line graph with length n where none of the nodes are repeats (I have some bidirectional edges and cycles in my graph)
So the main problem is that when I do something such as:
MATCH a-->b, a-->c, a-->d
MATCH a-->b-->c-->d
Cypher doesn't guarantee (when I tried it) that a, b, c, and d are all different nodes. For small graphs, this can easily be fixed with
WHERE not(a=b) AND not(a=c) AND ...
But I'm trying to have graphs of size 10+, so checking equality between all nodes isn't a viable option. Afaik, RETURN DISTINCT does not work as well since it doesn't check equality among variables, only across different rows. Is there any simple way I can specify the query to make the differently named nodes distinct?
Old question, but look to APOC Path Expander procedures for how to address these kinds of use cases, as you can change the traversal uniqueness behavior for expansion (the same way you can when using the traversal API...which these procedures use).
Cypher implicitly uses RELATIONSHIP_PATH uniqueness, meaning that per path returned, a relationship must be unique, it cannot be used multiple times in a single path.
While this is good for queries where you need all possible paths, it's not a good fit for queries where you want distinct nodes or a subgraph or to prevent repeating nodes in a path.
For an n-length path, let's say depth 6 with only outgoing relationships of any type, we can change the uniqueness to NODE_PATH, where a node must be unique per path, no repeats in a path:
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.expandConfig(n, {maxLevel:6, uniqueness:'NODE_PATH'}) YIELD path
RETURN path
If you want all reachable nodes up to a certain depth (or at any depth by omitting maxLevel), you can use NODE_GLOBAL uniqueness, or instead just use apoc.path.subgraphNodes():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphNodes(n, {maxLevel:6}) YIELD node
RETURN node
NODE_GLOBAL uniqueness means that across all paths that a node must be unique, it will only be visited once, and there will only be one path to a node from a given start node. This keeps the number of paths that need to be evaluated down significantly, but because of this behavior not all relationships will be traversed, if they expand to a node already visited.
You will not get relationships back with this procedure (you can use apoc.path.spanningTree() for that, although as previously mentioned not all relationships will be included, as we will only capture a single path to each node, not all possible paths to nodes). If you want all nodes up to a max level and all possible relationships between those nodes, then use apoc.path.subgraphAll():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphAll(n, {maxLevel:6}) YIELD nodes, relationships
RETURN nodes, relationships
Richer options exist for label and relationship filtering, or filtering (whitelist, blacklist, endnode, terminator node) based on lists of pre-matched nodes.
We also support repeating sequences of relationships or node labels.
If you need filtering by node or relationship properties during expansion, then this won't be a good option as that feature is yet supported.

Extract subgraph in neo4j

I have a large network stored in Neo4j. Based on a particular root node, I want to extract a subgraph around that node and store it somewhere else. So, what I need is the set of nodes and edges that match my filter criteria.
Afaik there is no out-of-the-box solution available. There is a graph matching component available, but it works only for perfect matches. The Neo4j API itself defines only graph traversal which I can use to define which nodes/edges should be visited:
Traverser exp = Traversal
.description()
.breadthFirst()
.evaluator(Evaluators.toDepth(2))
.traverse(root);
Now, I can add all nodes/edges to sets for all paths, but this is very inefficient. How would you do it? Thanks!
EDIT Would it make sense to add the last node and the last relationship of each traversal to the subgraph?
As for graph matching, that has been superseded by http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html which would fit nicely, and supports fuzzy matchin with optional relationships.
For subgraph representation, I would use the Cypher output to maybe construct new Cypher statements for recreating the graph, much like a SQL export, something like
start n=node:node_auto_index(name='Neo')
match n-[r:KNOWS*]-m
return "create ({name:'"+m.name+"'});"
http://console.neo4j.org/r/pqf1rp for an example
I solved it by constructing the induced subgraph based on all traversal endpoints.
Building the subgraph from the set of last nodes and edges of every traversal does not work, because edges that are not part of any shortest paths would not be included.
The code snippet looks like this:
Set<Node> nodes = new HashSet<Node>();
Set<Relationship> edges = new HashSet<Relationship>();
for (Node n : traverser.nodes())
{
nodes.add(n);
}
for (Node node : nodes)
{
for (Relationship rel : node.getRelationships())
{
if (nodes.contains(rel.getOtherNode(node)))
edges.add(rel);
}
}
Every edge is added twice. One time for the outgoing node and one time for the incoming node. Using a Set, I can ensure that it's in the collection only once.
It is possible to iterate over incoming/outgoing edges only, but it is unclear how loops (edge from a node to itself) are handled. To which category do they belong to? This snippet does not have this issue.
See dumping the database to cypher statements
dump START n=node({self}) MATCH p=(n)-[r:KNOWS*]->(m) RETURN n,r,m;
There's also an example for importing the subgraph of first database (db1) into a second (db2).

Resources