How to filter results by node label in neo4j cypher? - neo4j

I have a graph database that maps out connections between buildings and bus stations, where the graph contains other connecting pieces like roads and intersections (among many node types).
What I'm trying to figure out is how to filter a path down to only return specific node types. I have two related questions that I'm currently struggling with.
Question 1: How do I return the labels of nodes along a path?
It seems like a logical first step is to determine what type of nodes occur along the path.
I have tried the following:
MATCH p=(a:Building)­-[:CONNECTED_TO*..5]­-(b:Bus)
WITH nodes(p) AS nodes
RETURN DISTINCT labels(nodes);
However, I'm getting a type exception error that labels() expects data of type node and not Collection. I'd like to dynamically know what types of nodes are on my paths so that I can eventually filter my paths.
Question 2: How can I return a subset of the nodes in a path that match a label I identified in the first step?
Say I found that that between (a:Building) and (d1:Bus) and (d2:Bus) I can expect to find (:Intersection) nodes and (:Street) nodes.
This is a simplified model of my graph:
(a:Building)­­--(:Street)­--­(:Street)--­­(b1:Bus)
\­­(:Street)--­­(:Intersection)­­--(:Street)--­­(b2:Bus)
I've written a MATCH statement that would look for all possible paths between (:Building) and (:Bus) nodes. What would I need to do next to filter to selectively return the Street nodes?
MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
// Insert logic to only return (:Street) nodes from p
Any guidance on this would be greatly appreciated!

To get the distinct labels along matching paths:
MATCH p=(a:Building)-[:CONNECTED_TO*..5]-(b:Bus)
WITH NODES(p) AS nodes
UNWIND nodes AS n
WITH LABELS(n) AS ls
UNWIND ls AS label
RETURN DISTINCT label;
To return the nodes that have the Street label.
MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
WITH NODES(p) AS nodes
UNWIND nodes AS n
WITH n
WHERE 'Street' IN LABELS(n)
RETURN n;

Cybersam's answers are good, but their output is simply a column of labels...you lose the path information completely. So if there are multiple paths from a :Building to a :Bus, the first query will only output all labels in all nodes in all patterns, and you can't tell how many paths exist, and since you lose path information, you cannot tell what labels are in some paths but not others, or common between some paths.
Likewise, the second query loses path information, so if there are multiple paths using different streets to get from a :Building to a :Bus, cybersam's query will return all streets involved in all paths. It is possible for it to output all streets in your graph, which doesn't seem very useful.
You need queries that preserve path information.
For 1, finding the distinct labels on nodes on each path I would offer this query:
MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH REDUCE(myLabels = [], node in nodes | myLabels + labels(node)) as myLabels
RETURN DISTINCT myLabels
For 2, this query preserves path information:
MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH FILTER(node in nodes WHERE (node:Street)) as pathStreets
RETURN pathStreets
Note that these are both expensive operations, as they perform a cartesian product of all buildings and all busses, as in the queries in your description. I highly recommend narrowing down the buildings and busses you're matching upon, hopefully to very few or specific buildings at least.
I also encourage limiting how deep you're looking in your pattern. I get the idea that many, if not most, of your nodes in your graph are connected by :CONNECTED_TO relationships, and if we don't cap that to a reasonable amount, your query could be finding every single path through your entire graph, no matter how long or convoluted or nonsensical, and I don't think that's what you want.

Related

How to get all connected nodes in neo4j

I want to get list of all connected nodes starting from node 0 as shown in the diagram
Based on your comment:
I want to get a list of all the connected nodes. For example in the
above case when I search for connected nodes for 0, it should return
nodes- 1,2,3
This query will do what you want:
MATCH ({id : 0})-[*]-(connected)
RETURN connected
The above query will return all nodes connected with a node with id=0 (I'm considering that the numbers inside the nodes are values of an id property) in any depth, both directions and considering any relationship type. Take a look in the section Relationships in depth of the docs.
While this will work fine for small graphs note that this is a very expensive operation. It will go through the entire graph starting from the start point ({id : 0}) considering any relationship type. This is really not a good idea for production environments.
If you wish to match the nodes that have a relationship to another node, you can use this:
MATCH (n) MATCH (n)-[r]-() RETURN n,r
It will return you all the nodes that have a relationship to another node or nodes, irrespective of the direction of the relationship.
If you wish to add a constraint you can do it this way:
MATCH (n:Label {id:"id"}) MATCH (n)-[r]-() RETURN n,r
For larger or more heavily interconnected graphs, APOC Procedures offers a more efficient means of traversal that returns all nodes in a subgraph.
As others have already mentioned, it's best to use labels on your nodes, and add either an index or a unique constraint on the label+property for fast lookup of your starting node.
Using a label of "Label", and a parameter of idParam, a query to get nodes of the subgraph with APOC would be:
MATCH (n:Label {id:$idParam})
CALL apoc.path.subgraphNodes(n, {minLevel:1}) YIELD node
RETURN node
Nodes will be distinct, and the starting node will not be returned with the rest.
EDIT
There's currently a restriction preventing usage of minLevel in subgraphNodes(), you can use either filter out the starting node yourself, or use apoc.path.expandConfig() using uniqueness:'NODE_GLOBAL' to get the same effect.

Optimizing Cypher Query

I am currently starting to work with Neo4J and it's query language cypher.
I have a multple queries that follow the same pattern.
I am doing some comparison between a SQL-Database and Neo4J.
In my Neo4J Datababase I habe one type of label (person) and one type of relationship (FRIENDSHIP). The person has the propterties personID, name, email, phone.
Now I want to have the the friends n-th degree. I also want to filter out those persons that are also friends with a lower degree.
FOr example if I want to search for the friends 3 degree I want to filter out those that are also friends first and/or second degree.
Here my query type:
MATCH (me:person {personID:'1'})-[:FRIENDSHIP*3]-(friends:person)
WHERE NOT (me:person)-[:FRIENDSHIP]-(friends:person)
AND NOT (me:person)-[:FRIENDSHIP*2]-(friends:person)
RETURN COUNT(DISTINCT friends);
I found something similiar somewhere.
This query works.
My problem is that this pattern of query is much to slow if I search for a higher degree of friendship and/or if the number of persons becomes more.
So I would really appreciate it, if somemone could help me with optimize this.
If you just wanted to handle depths of 3, this should return the distinct nodes that are 3 degrees away but not also less than 3 degrees away:
MATCH (me:person {personID:'1'})-[:FRIENDSHIP]-(f1:person)-[:FRIENDSHIP]-(f2:person)-[:FRIENDSHIP]-(f3:person)
RETURN apoc.coll.subtract(COLLECT(f3), COLLECT(f1) + COLLECT(f2) + me) AS result;
The above query uses the APOC function apoc.coll.subtract to remove the unwanted nodes from the result. The function also makes sure the collection contains distinct elements.
The following query is more general, and should work for any given depth (by just replacing the number after *). For example, this query will work with a depth of 4:
MATCH p=(me:person {personID:'1'})-[:FRIENDSHIP*4]-(:person)
WITH NODES(p)[0..-1] AS priors, LAST(NODES(p)) AS candidate
UNWIND priors AS prior
RETURN apoc.coll.subtract(COLLECT(DISTINCT candidate), COLLECT(DISTINCT prior)) AS result;
The problem with Cypher's variable-length relationship matching is that it's looking for all possible paths to that depth. This can cause unnecessary performance issues when all you're interested in are the nodes at certain depths and not the paths to them.
APOC's path expander using 'NODE_GLOBAL' uniqueness is a more efficient means of matching to nodes at inclusive depths.
When using 'NODE_GLOBAL' uniqueness, nodes are only ever visited once during traversal. Because of this, when we set the path expander's minLevel and maxLevel to be the same, the result are nodes at that level that are not present at any lower level, which is exactly the result you're trying to get.
Try this query after installing APOC:
MATCH (me:person {personID:'1'})
CALL apoc.path.expandConfig(me, {uniqueness:'NODE_GLOBAL', minLevel:4, maxLevel:4}) YIELD path
// a single path for each node at depth 4 but not at any lower depth
RETURN COUNT(path)
Of course you'll want to parameterize your inputs (personID, level) when you get the chance.

neo4j query to exclude nodes related to nodes with certain properties

I am trying to write a neo4j query where I only want to present nodes that are have no relation to nodes with a specific property. One way to think of it is where two separate graphs exist where one node has the property I want to exclude. I should get a result that only contains the graph of the set of nodes not connected to the node that has the property I want to exclude. This is what the graph looks like before my query
match (n) where not (n{property:'valueIWishToExclude'})--() return n
This is what the result of the query looks like
I only want to have the four connected nodes in my results. How can I set up a query that excludes the nodes that are not connected to the node with the property I wish to exclude?
In fact you need those nodes from which there is no path to the node that should be excluded. You can use the shortestPath function and ALL predicate:
match (ex) where n.property = 'valueIWishToExclude'
with collect(ex) as exn
match (n) where (not n.property = 'valueIWishToExclude') and
ALL(e in exn where not shortestPath( (n)-[*]-(e) ) is null)
return n
You are almost there, just add in the relationship in your query to only get the nodes that are related to each other
MATCH (n:label) -[:RELATED]->() where n.property<>'exclude'
RETURN n
That should return only the nodes connected to each other, as the other nodes do not have that relationship.
Let me know if that worked for you.
You may want to alter your wording a bit, what you're asking for in this question, and what you really want, are not the same thing.
In Neo4j (and most graph databases), the phrase "nodes that have no relation to..." means nodes that are not connected by a relationship to the node in question.
In that context, in your right graph (assuming the one node selected is the node marked as excluded), one node would fit the criteria and be returned as a possible result, the topmost node, since it doesn't have a relationship to the node you want to exclude; It is however two relationships removed from the excluded node.
You seem to be asking for something else, though. You seem to want nodes that are not in the same subgraph as the node to exclude. Or, alternately, nodes that have no path to the excluded node.
Make sure on future queries you're clear about what you're asking, or you'll get answers that have no relevance to what you really want.
One approach that will work is to first find all nodes within the subgraph of the excluded node, and then return all nodes that are not in those subgraph nodes.
You'll want to install APOC Procedures so you can make use of a fast means of obtaining nodes within the subgraph.
You'll also want to use labels in your graph, and maybe put an index on the property you're searching for as this will make your search fast. As it is now, your query must examine every node in your entire database to find nodes with the property in question, and that will become slower and slower as your graph grows.
Your query might look like this (using 'Label' as a stand-in for the node label):
MATCH (n:Label{propertyToExclude:'valueToExclude'})
CALL apoc.path.expandConfig(n, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH COLLECT(DISTINCT LAST(NODES(path))) as subgraph
MATCH (n)
WHERE NOT n in subgraph
RETURN n

Find the nodes in a Neo subgraph

I have a cyclic subgraph. I would like to know all the relationships in that subgraph. I don't know how deep the subgraph is, nor do I want to hardcode any relationship types.
The best thing I have found so far is driven by this snippet.
match(n:X)-[r*]->(m)
from r, I can find what I need. However, even for a small subgraph the cardinality of r* can be 30k or more. There is no point for Neo to calculate every path through the subgraph. I really just need the nodes or the individual relationships (preferred).
What is a way to just get the individual relationships in a subgraph? We're using Cypher.
Cypher provides no way to get all the relationships in a subgraph without following the paths. Besides, it has to explore those paths anyway in order to figure out what nodes and relationships belong to the subgraph.
To ensure that you get each relationship in a cyclic subgraph only once, you can do this:
MATCH p=(:Foo)-[*]->()
WITH RELATIONSHIPS(p) AS ps
UNWIND ps AS p
RETURN DISTINCT p;
Note, however, that variable-length path queries with no upper bound can be very expensive and may end up running "forever".
Alternate approach
If you can identify all the nodes in the desired subgraph, then there can be a more performant approach.
For example, let's suppose that all the nodes in the desired subgraph (and only those nodes) have the label X. In that case, this quick query will return all the relationships in the subgraph:
MATCH p=(:Foo)-[r]->()
RETURN r;
You can collect all nodes in a connected components with a breadth first or depth first search without filter.
The neo4j REST API has a traversal endpoint which can be used to do exactly that. It's not a Cypher query, but it could solve your problem: http://neo4j.com/docs/stable/rest-api-traverse.html
You can POST something like this against a node, there are options to only take unique nodes. Not sure but this might help with a cyclic graph.
{
"order" : "breadth_first",
"uniqueness" : "node_global",
"return_filter" : {
"language" : "builtin",
"name" : "all"
},
"max_depth" : 20
}

How to find distinct nodes in a Neo4j/Cypher query

I'm trying to do some pattern matching in neo4j/cypher and I came across this issue:
There are two types of graphs I want to search for:
Star graphs: A graph with one center node and multiple outgoing relationships.
n-length line graphs: A line graph with length n where none of the nodes are repeats (I have some bidirectional edges and cycles in my graph)
So the main problem is that when I do something such as:
MATCH a-->b, a-->c, a-->d
MATCH a-->b-->c-->d
Cypher doesn't guarantee (when I tried it) that a, b, c, and d are all different nodes. For small graphs, this can easily be fixed with
WHERE not(a=b) AND not(a=c) AND ...
But I'm trying to have graphs of size 10+, so checking equality between all nodes isn't a viable option. Afaik, RETURN DISTINCT does not work as well since it doesn't check equality among variables, only across different rows. Is there any simple way I can specify the query to make the differently named nodes distinct?
Old question, but look to APOC Path Expander procedures for how to address these kinds of use cases, as you can change the traversal uniqueness behavior for expansion (the same way you can when using the traversal API...which these procedures use).
Cypher implicitly uses RELATIONSHIP_PATH uniqueness, meaning that per path returned, a relationship must be unique, it cannot be used multiple times in a single path.
While this is good for queries where you need all possible paths, it's not a good fit for queries where you want distinct nodes or a subgraph or to prevent repeating nodes in a path.
For an n-length path, let's say depth 6 with only outgoing relationships of any type, we can change the uniqueness to NODE_PATH, where a node must be unique per path, no repeats in a path:
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.expandConfig(n, {maxLevel:6, uniqueness:'NODE_PATH'}) YIELD path
RETURN path
If you want all reachable nodes up to a certain depth (or at any depth by omitting maxLevel), you can use NODE_GLOBAL uniqueness, or instead just use apoc.path.subgraphNodes():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphNodes(n, {maxLevel:6}) YIELD node
RETURN node
NODE_GLOBAL uniqueness means that across all paths that a node must be unique, it will only be visited once, and there will only be one path to a node from a given start node. This keeps the number of paths that need to be evaluated down significantly, but because of this behavior not all relationships will be traversed, if they expand to a node already visited.
You will not get relationships back with this procedure (you can use apoc.path.spanningTree() for that, although as previously mentioned not all relationships will be included, as we will only capture a single path to each node, not all possible paths to nodes). If you want all nodes up to a max level and all possible relationships between those nodes, then use apoc.path.subgraphAll():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphAll(n, {maxLevel:6}) YIELD nodes, relationships
RETURN nodes, relationships
Richer options exist for label and relationship filtering, or filtering (whitelist, blacklist, endnode, terminator node) based on lists of pre-matched nodes.
We also support repeating sequences of relationships or node labels.
If you need filtering by node or relationship properties during expansion, then this won't be a good option as that feature is yet supported.

Resources