Find all nodes with two-way relationships starting from one specific node using cypher in neo4j - neo4j

neo4j nodes and relationships
This is quite a tough job. I'm trying to find all nodes with two-way relationships starting from a specific node. Based on the image above, I would like to find all two-way relationships starting from node 1. Only nodes with two-way relationships match. For example, node 1,3,4 matches and node 1,2,3 matches as two separate groups. However, if node 2 and 4 has a two-way relationship, then node 1,2,3,4 matches as one group. The main idea is that all nodes are linked both ways in such a group. My idea is to find all nodes with two-way relationships starting from 1 and continue processing, but I'm not able to continue. Can anyone help me with this problem, thanks a lot. By the way, only the largest 'two-way-circle' is needed.

Your problem looks a lot like finding strongly connected components in the graph. As defined in the docs.
A directed graph is strongly connected if there is a path between all
pairs of vertices ( nodes ). This algorithms treats the graph as directed, so
the direction of the relationship is important and strongly connected
compoment exists only if there are relationships between nodes in both
direction.
Check out more in the documentation. You will need neo4j-graph-algorithms.
Example query with writing back the component of the graph to the node.
CALL algo.scc('Label','C', {write:true,partitionProperty:'partition'})
YIELD loadMillis, computeMillis, writeMillis, setCount, maxSetSize, minSetSize
And then you can find your biggest component with the following query.
MATCH (u:Label)
RETURN distinct(u.partition) as partition,count(*) as size_of_partition
ORDER by size_of_partition DESC LIMIT 1

Related

Why so many db hit in neo4j?

There is total 1 Category node and 2 Template node in my case. I put an * in [*] to support more further scenarios. But why there are so many db hit in this cypher for current data?
It's probably the * in the relationship part of your query that's doing it.
While you've got only one Category node and two Template nodes, you've asked Neo4j to hop through any number of relationships to get from one to the other and not given it any help to narrow down the search besides specifying the starting node.
For example, if your Category was connected to 100,000 other nodes (of any label, not just Template) you've forced Neo4j to jump through every single one of them looking to see if there's a path to a Template node - and if those nodes have their own connections then they all need to be explored too, because the depth of the traversal isn't constrained.
If you know how Category and Template nodes can be connected in ways you're interested in (for example, if there's only every some specific set of relationships you want to traverse) then you'll radically improve the performance of the query. Equally, reducing the maximum length of the path will help.

Fast search for unconnected nodes in big neo4j graph

So, i've created a Neo4j graph database out of a relational database. The graph database has about 7 million nodes, and about 9 million relationships between the nodes.
I now want to find all nodes, that are not connected to nodes with a certain label (let's call them unconnected nodes). For example, i have nodes with the labels "Customer" and "Order" (let's call them top-level-nodes). I want to find all nodes that have no relationship from or to these top-level-nodes. The relationship doesn't have to be direct, the nodes can be connected via other nodes to the top-level-nodes.
I have a cypher query which would solve this problem:
MATCH (a) WHERE not ((a)-[*]-(:Customer)) AND not ((a)-[*]-(:Order)) RETURN a;
As you can imagine, the query will need a long time to execute, the performance is bad. Most likely because of the undirected relationship and because it doesn't matter via how many nodes the relationship can be made. However, the relationship directions don't matter, and i need to make sure that there is no path from any node to one of the top-level-nodes.
Is there any way to find the unconnected nodes faster ? Note that the database is really big, and there are more than 2 labels which mark top-level-nodes.
You could try this approach, which does involve more operations, but can be run in batches for better performance (see apoc.periodic.commit() in the APOC procedures library).
The idea is to first apply a label (say, :Unconnected) to all nodes in your graph (batch execute with apoc.periodic.commit), and then, taking batches of top level nodes with that label, matching to all nodes in the subgraphs extending from them and removing that label.
When you finally have run out of top level nodes with the :Unconnected label (meaning all top level nodes and their subgraphs no longer have this label) then the only nodes remaining in your graph with the :Unconnected label are not connected to your top level nodes.
Any approach to this kind of operation will likely be slow, but the advantage again is that you can process this in batches, and if you get interrupted, you can resume. Once your queries are done, all the relevant unconnected nodes are now labeled for further processing at your convenience.
Also, one last note, in Neo4j undirected relationships have no arrows in the syntax ()-[*]-().
MATCH (a)
WHERE
not (a:Customer OR a:Order)
AND shortestPath((a)-[*]-(:Customer)) IS NULL
AND shortestPath((a)-[*]-(:Order)) IS NULL
RETURN a;
If you could add rel-types it would be faster.
One further optimization could be to check the nodes of an :Customer path for an :Order node and vice versa. i.e.
NONE(n in nodes(path) WHERE n:Order)
In general, this might be rather a set operation, i.e.
expand around all order and customer nodes in parallel into two sets
and compute the overlap between the two sets.
Then remove the overlap from the total number of nodes.
I added an issue for apoc here to add such a function or procedure
https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/223

Most efficient way to get all connected nodes in neo4j

The answer to this question shows how to get a list of all nodes connected to a particular node via a path of known relationship types.
As a follow up to that question, I'm trying to determine if traversing the graph like this is the most efficient way to get all nodes connected to a particular node via any path.
My scenario: I have a tree of groups (group can have any number of children). This I model with IS_PARENT_OF relationships. Groups can also relate to any other groups via a special relationship called role playing. This I model with PLAYS_ROLE_IN relationships.
The most common question I want to ask is MATCH(n {name: "xxx") -[*]-> (o) RETURN o.name, but this seems to be extremely slow on even a small number of nodes (4000 nodes - takes 5s to return an answer). Note that the graph may contain cycles (n-IS_PARENT_OF->o, n<-PLAYS_ROLE_IN-o).
Is connectedness via any path not something that can be indexed?
As a first point, by not using labels and an indexed property for your starting node, this will already need to first find ALL the nodes in the graph and opening the PropertyContainer to see if the node has the property name with a value "xxx".
Secondly, if you now an approximate maximum depth of parentship, you may want to limit the depth of the search
I would suggest you add a label of your choice to your nodes and index the name property.
Use label, e.g. :Group for your starting point and an index for :Group(name)
Then Neo4j can quickly find your starting point without scanning the whole graph.
You can easily see where the time is spent by prefixing your query with PROFILE.
Do you really want all arbitrarily long paths from the starting point? Or just all pairs of connected nodes?
If the latter then this query would be more efficient.
MATCH (n:Group)-[:IS_PARENT_OF|:PLAYS_ROLE_IN]->(m:Group)
RETURN n,m

Nearest nodes to a give node, assigning dynamically weight to relationship types

I need to find the N nodes "nearest" to a given node in a graph, meaning the ones with least combined weight of relationships along the path from given node.
Is is possible to do so with a pure Cypher only solution? I was looking about path functions but couldn't find a viable way to express my query.
Moreover, is it possible to assign a default weight to a relationship at query time, according to its type/label (or somehow else map the relationship type to the weight)? The idea is to experiment with different weights without having to change a property for every relationship.
Otherwise I would have to change the weight property's value to each relationship and re-do it to before each query, which is very time-consuming (my graph has around 10M relationships).
Again, a pure Cypher solution would be the best, or please point me in the right direction.
Please use variable length Cypher queries to find the nearest nodes from a single node.
MATCH (n:Start { id: 0 }),
(n)-[:CONNECTED*0..2]-(x)
RETURN x
Note that the syntax [CONNECTED*0..2] is a range parameter specifying the min and max relationship distance from a given node, with relationship type CONNECTED.
You can swap this relationship type for other types.
In the case you wanted to traverse variably from the start node to surrounding nodes but constrain via a stop criteria to a threshold, that is a bit more difficult. For these kinds of things it is useful to get acquainted with Neo4j's spatial plugin. A good starting point to learn more about Neo4j spatial can be found in this blog post: http://neo4j.com/blog/neo4j-spatial-part1-finding-things-close-to-other-things
The post is a little outdated but if you do some Google searching you can find more updated materials.
GitHub repository: https://github.com/neo4j-contrib/spatial

Neo4j: Java API to compute intersection multiple properties

I'm very new in using Neo4j and have a question regarding the computation of intersections of nodes.
Let's suppose, I have the three properties A,B,C and I want to select only the nodes that have all three properties.
I created an index for the properties and thus, I can get all nodes having one of the properties. However, afterwards I have to merge the IndexHits. Is there a way to select directly all nodes having the three properties?
My second idea was to create a node for each property and connect other nodes by relationships. I can then iterate over all relationships and get for each property a list of nodes which are connected. But again, I have to compute the intersection afterwards.
Is there a function I miss here, since I suppose it's a standard problem.
Thanks a lot,
Benny
Do you also have the values you look for? You would start with the property that limits the amount of found nodes most.
MATCH (a:Label {property1:{value1}})
WHERE a.property2 = {value2} AND a.property3 = {value3}
RETURN a
For the Java API and lucene indexes:
gdb.index().forNodes("foo").query("p1:value1 p2:value2 p3:value3")
Lucene query syntax

Resources