Find the nodes in a Neo subgraph - neo4j

I have a cyclic subgraph. I would like to know all the relationships in that subgraph. I don't know how deep the subgraph is, nor do I want to hardcode any relationship types.
The best thing I have found so far is driven by this snippet.
match(n:X)-[r*]->(m)
from r, I can find what I need. However, even for a small subgraph the cardinality of r* can be 30k or more. There is no point for Neo to calculate every path through the subgraph. I really just need the nodes or the individual relationships (preferred).
What is a way to just get the individual relationships in a subgraph? We're using Cypher.

Cypher provides no way to get all the relationships in a subgraph without following the paths. Besides, it has to explore those paths anyway in order to figure out what nodes and relationships belong to the subgraph.
To ensure that you get each relationship in a cyclic subgraph only once, you can do this:
MATCH p=(:Foo)-[*]->()
WITH RELATIONSHIPS(p) AS ps
UNWIND ps AS p
RETURN DISTINCT p;
Note, however, that variable-length path queries with no upper bound can be very expensive and may end up running "forever".
Alternate approach
If you can identify all the nodes in the desired subgraph, then there can be a more performant approach.
For example, let's suppose that all the nodes in the desired subgraph (and only those nodes) have the label X. In that case, this quick query will return all the relationships in the subgraph:
MATCH p=(:Foo)-[r]->()
RETURN r;

You can collect all nodes in a connected components with a breadth first or depth first search without filter.
The neo4j REST API has a traversal endpoint which can be used to do exactly that. It's not a Cypher query, but it could solve your problem: http://neo4j.com/docs/stable/rest-api-traverse.html
You can POST something like this against a node, there are options to only take unique nodes. Not sure but this might help with a cyclic graph.
{
"order" : "breadth_first",
"uniqueness" : "node_global",
"return_filter" : {
"language" : "builtin",
"name" : "all"
},
"max_depth" : 20
}

Related

How to get all connected nodes in neo4j

I want to get list of all connected nodes starting from node 0 as shown in the diagram
Based on your comment:
I want to get a list of all the connected nodes. For example in the
above case when I search for connected nodes for 0, it should return
nodes- 1,2,3
This query will do what you want:
MATCH ({id : 0})-[*]-(connected)
RETURN connected
The above query will return all nodes connected with a node with id=0 (I'm considering that the numbers inside the nodes are values of an id property) in any depth, both directions and considering any relationship type. Take a look in the section Relationships in depth of the docs.
While this will work fine for small graphs note that this is a very expensive operation. It will go through the entire graph starting from the start point ({id : 0}) considering any relationship type. This is really not a good idea for production environments.
If you wish to match the nodes that have a relationship to another node, you can use this:
MATCH (n) MATCH (n)-[r]-() RETURN n,r
It will return you all the nodes that have a relationship to another node or nodes, irrespective of the direction of the relationship.
If you wish to add a constraint you can do it this way:
MATCH (n:Label {id:"id"}) MATCH (n)-[r]-() RETURN n,r
For larger or more heavily interconnected graphs, APOC Procedures offers a more efficient means of traversal that returns all nodes in a subgraph.
As others have already mentioned, it's best to use labels on your nodes, and add either an index or a unique constraint on the label+property for fast lookup of your starting node.
Using a label of "Label", and a parameter of idParam, a query to get nodes of the subgraph with APOC would be:
MATCH (n:Label {id:$idParam})
CALL apoc.path.subgraphNodes(n, {minLevel:1}) YIELD node
RETURN node
Nodes will be distinct, and the starting node will not be returned with the rest.
EDIT
There's currently a restriction preventing usage of minLevel in subgraphNodes(), you can use either filter out the starting node yourself, or use apoc.path.expandConfig() using uniqueness:'NODE_GLOBAL' to get the same effect.

Optimizing Cypher Query

I am currently starting to work with Neo4J and it's query language cypher.
I have a multple queries that follow the same pattern.
I am doing some comparison between a SQL-Database and Neo4J.
In my Neo4J Datababase I habe one type of label (person) and one type of relationship (FRIENDSHIP). The person has the propterties personID, name, email, phone.
Now I want to have the the friends n-th degree. I also want to filter out those persons that are also friends with a lower degree.
FOr example if I want to search for the friends 3 degree I want to filter out those that are also friends first and/or second degree.
Here my query type:
MATCH (me:person {personID:'1'})-[:FRIENDSHIP*3]-(friends:person)
WHERE NOT (me:person)-[:FRIENDSHIP]-(friends:person)
AND NOT (me:person)-[:FRIENDSHIP*2]-(friends:person)
RETURN COUNT(DISTINCT friends);
I found something similiar somewhere.
This query works.
My problem is that this pattern of query is much to slow if I search for a higher degree of friendship and/or if the number of persons becomes more.
So I would really appreciate it, if somemone could help me with optimize this.
If you just wanted to handle depths of 3, this should return the distinct nodes that are 3 degrees away but not also less than 3 degrees away:
MATCH (me:person {personID:'1'})-[:FRIENDSHIP]-(f1:person)-[:FRIENDSHIP]-(f2:person)-[:FRIENDSHIP]-(f3:person)
RETURN apoc.coll.subtract(COLLECT(f3), COLLECT(f1) + COLLECT(f2) + me) AS result;
The above query uses the APOC function apoc.coll.subtract to remove the unwanted nodes from the result. The function also makes sure the collection contains distinct elements.
The following query is more general, and should work for any given depth (by just replacing the number after *). For example, this query will work with a depth of 4:
MATCH p=(me:person {personID:'1'})-[:FRIENDSHIP*4]-(:person)
WITH NODES(p)[0..-1] AS priors, LAST(NODES(p)) AS candidate
UNWIND priors AS prior
RETURN apoc.coll.subtract(COLLECT(DISTINCT candidate), COLLECT(DISTINCT prior)) AS result;
The problem with Cypher's variable-length relationship matching is that it's looking for all possible paths to that depth. This can cause unnecessary performance issues when all you're interested in are the nodes at certain depths and not the paths to them.
APOC's path expander using 'NODE_GLOBAL' uniqueness is a more efficient means of matching to nodes at inclusive depths.
When using 'NODE_GLOBAL' uniqueness, nodes are only ever visited once during traversal. Because of this, when we set the path expander's minLevel and maxLevel to be the same, the result are nodes at that level that are not present at any lower level, which is exactly the result you're trying to get.
Try this query after installing APOC:
MATCH (me:person {personID:'1'})
CALL apoc.path.expandConfig(me, {uniqueness:'NODE_GLOBAL', minLevel:4, maxLevel:4}) YIELD path
// a single path for each node at depth 4 but not at any lower depth
RETURN COUNT(path)
Of course you'll want to parameterize your inputs (personID, level) when you get the chance.

neo4j query to exclude nodes related to nodes with certain properties

I am trying to write a neo4j query where I only want to present nodes that are have no relation to nodes with a specific property. One way to think of it is where two separate graphs exist where one node has the property I want to exclude. I should get a result that only contains the graph of the set of nodes not connected to the node that has the property I want to exclude. This is what the graph looks like before my query
match (n) where not (n{property:'valueIWishToExclude'})--() return n
This is what the result of the query looks like
I only want to have the four connected nodes in my results. How can I set up a query that excludes the nodes that are not connected to the node with the property I wish to exclude?
In fact you need those nodes from which there is no path to the node that should be excluded. You can use the shortestPath function and ALL predicate:
match (ex) where n.property = 'valueIWishToExclude'
with collect(ex) as exn
match (n) where (not n.property = 'valueIWishToExclude') and
ALL(e in exn where not shortestPath( (n)-[*]-(e) ) is null)
return n
You are almost there, just add in the relationship in your query to only get the nodes that are related to each other
MATCH (n:label) -[:RELATED]->() where n.property<>'exclude'
RETURN n
That should return only the nodes connected to each other, as the other nodes do not have that relationship.
Let me know if that worked for you.
You may want to alter your wording a bit, what you're asking for in this question, and what you really want, are not the same thing.
In Neo4j (and most graph databases), the phrase "nodes that have no relation to..." means nodes that are not connected by a relationship to the node in question.
In that context, in your right graph (assuming the one node selected is the node marked as excluded), one node would fit the criteria and be returned as a possible result, the topmost node, since it doesn't have a relationship to the node you want to exclude; It is however two relationships removed from the excluded node.
You seem to be asking for something else, though. You seem to want nodes that are not in the same subgraph as the node to exclude. Or, alternately, nodes that have no path to the excluded node.
Make sure on future queries you're clear about what you're asking, or you'll get answers that have no relevance to what you really want.
One approach that will work is to first find all nodes within the subgraph of the excluded node, and then return all nodes that are not in those subgraph nodes.
You'll want to install APOC Procedures so you can make use of a fast means of obtaining nodes within the subgraph.
You'll also want to use labels in your graph, and maybe put an index on the property you're searching for as this will make your search fast. As it is now, your query must examine every node in your entire database to find nodes with the property in question, and that will become slower and slower as your graph grows.
Your query might look like this (using 'Label' as a stand-in for the node label):
MATCH (n:Label{propertyToExclude:'valueToExclude'})
CALL apoc.path.expandConfig(n, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH COLLECT(DISTINCT LAST(NODES(path))) as subgraph
MATCH (n)
WHERE NOT n in subgraph
RETURN n

How to filter results by node label in neo4j cypher?

I have a graph database that maps out connections between buildings and bus stations, where the graph contains other connecting pieces like roads and intersections (among many node types).
What I'm trying to figure out is how to filter a path down to only return specific node types. I have two related questions that I'm currently struggling with.
Question 1: How do I return the labels of nodes along a path?
It seems like a logical first step is to determine what type of nodes occur along the path.
I have tried the following:
MATCH p=(a:Building)­-[:CONNECTED_TO*..5]­-(b:Bus)
WITH nodes(p) AS nodes
RETURN DISTINCT labels(nodes);
However, I'm getting a type exception error that labels() expects data of type node and not Collection. I'd like to dynamically know what types of nodes are on my paths so that I can eventually filter my paths.
Question 2: How can I return a subset of the nodes in a path that match a label I identified in the first step?
Say I found that that between (a:Building) and (d1:Bus) and (d2:Bus) I can expect to find (:Intersection) nodes and (:Street) nodes.
This is a simplified model of my graph:
(a:Building)­­--(:Street)­--­(:Street)--­­(b1:Bus)
\­­(:Street)--­­(:Intersection)­­--(:Street)--­­(b2:Bus)
I've written a MATCH statement that would look for all possible paths between (:Building) and (:Bus) nodes. What would I need to do next to filter to selectively return the Street nodes?
MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
// Insert logic to only return (:Street) nodes from p
Any guidance on this would be greatly appreciated!
To get the distinct labels along matching paths:
MATCH p=(a:Building)-[:CONNECTED_TO*..5]-(b:Bus)
WITH NODES(p) AS nodes
UNWIND nodes AS n
WITH LABELS(n) AS ls
UNWIND ls AS label
RETURN DISTINCT label;
To return the nodes that have the Street label.
MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
WITH NODES(p) AS nodes
UNWIND nodes AS n
WITH n
WHERE 'Street' IN LABELS(n)
RETURN n;
Cybersam's answers are good, but their output is simply a column of labels...you lose the path information completely. So if there are multiple paths from a :Building to a :Bus, the first query will only output all labels in all nodes in all patterns, and you can't tell how many paths exist, and since you lose path information, you cannot tell what labels are in some paths but not others, or common between some paths.
Likewise, the second query loses path information, so if there are multiple paths using different streets to get from a :Building to a :Bus, cybersam's query will return all streets involved in all paths. It is possible for it to output all streets in your graph, which doesn't seem very useful.
You need queries that preserve path information.
For 1, finding the distinct labels on nodes on each path I would offer this query:
MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH REDUCE(myLabels = [], node in nodes | myLabels + labels(node)) as myLabels
RETURN DISTINCT myLabels
For 2, this query preserves path information:
MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH FILTER(node in nodes WHERE (node:Street)) as pathStreets
RETURN pathStreets
Note that these are both expensive operations, as they perform a cartesian product of all buildings and all busses, as in the queries in your description. I highly recommend narrowing down the buildings and busses you're matching upon, hopefully to very few or specific buildings at least.
I also encourage limiting how deep you're looking in your pattern. I get the idea that many, if not most, of your nodes in your graph are connected by :CONNECTED_TO relationships, and if we don't cap that to a reasonable amount, your query could be finding every single path through your entire graph, no matter how long or convoluted or nonsensical, and I don't think that's what you want.

Extract subgraph in neo4j

I have a large network stored in Neo4j. Based on a particular root node, I want to extract a subgraph around that node and store it somewhere else. So, what I need is the set of nodes and edges that match my filter criteria.
Afaik there is no out-of-the-box solution available. There is a graph matching component available, but it works only for perfect matches. The Neo4j API itself defines only graph traversal which I can use to define which nodes/edges should be visited:
Traverser exp = Traversal
.description()
.breadthFirst()
.evaluator(Evaluators.toDepth(2))
.traverse(root);
Now, I can add all nodes/edges to sets for all paths, but this is very inefficient. How would you do it? Thanks!
EDIT Would it make sense to add the last node and the last relationship of each traversal to the subgraph?
As for graph matching, that has been superseded by http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html which would fit nicely, and supports fuzzy matchin with optional relationships.
For subgraph representation, I would use the Cypher output to maybe construct new Cypher statements for recreating the graph, much like a SQL export, something like
start n=node:node_auto_index(name='Neo')
match n-[r:KNOWS*]-m
return "create ({name:'"+m.name+"'});"
http://console.neo4j.org/r/pqf1rp for an example
I solved it by constructing the induced subgraph based on all traversal endpoints.
Building the subgraph from the set of last nodes and edges of every traversal does not work, because edges that are not part of any shortest paths would not be included.
The code snippet looks like this:
Set<Node> nodes = new HashSet<Node>();
Set<Relationship> edges = new HashSet<Relationship>();
for (Node n : traverser.nodes())
{
nodes.add(n);
}
for (Node node : nodes)
{
for (Relationship rel : node.getRelationships())
{
if (nodes.contains(rel.getOtherNode(node)))
edges.add(rel);
}
}
Every edge is added twice. One time for the outgoing node and one time for the incoming node. Using a Set, I can ensure that it's in the collection only once.
It is possible to iterate over incoming/outgoing edges only, but it is unclear how loops (edge from a node to itself) are handled. To which category do they belong to? This snippet does not have this issue.
See dumping the database to cypher statements
dump START n=node({self}) MATCH p=(n)-[r:KNOWS*]->(m) RETURN n,r,m;
There's also an example for importing the subgraph of first database (db1) into a second (db2).

Resources