Graph: Creating shortest marked path to connect two nodes - graph-algorithm

Given an undirected, unweighted graph in which some nodes are marked, is there an efficient way to find the unmarked nodes between node A and B which would create a "marked" path from A to B when they are marked? The number of those "bridge" nodes should be minimal.
For example, in the graph below there would be two minimal ways to connect node A to B. One possibility would be to mark the node labelled 1, the other possibility would be to mark node 2.

Convert your graph into a directed, weighted graph such that:
the weight of each edge going into a marked node is set to 0
the weight of each edge going into an unmarked node is 1
Find all lowest-cost paths from A to B.

Related

Contract a path (set of paths) into an edge (a set of edges) in neo 4j graph

I have a graph with several node types (PERSON, CITATION, JOURNAL) and relationship types (authored_in, published_in, cites, referred, etc.) in my neo4j graph instance. While I like to keep the graph as is, I also want to run closeness centrality, clustering coefficients algorithms only for the graph formed by the PERSON nodes and the edges obtained by contracting the paths connecting pairs of PERSON nodes into single EDGES.
I think I can replace the original graph with a PERSON-PERSON graph - but I do not want to lose the original graph. Is there a way to store the original graph into "supernodes"?
You could just use a new relationship type to connect the existing Person nodes. Nothing else needs to change.

Find all nodes with two-way relationships starting from one specific node using cypher in neo4j

neo4j nodes and relationships
This is quite a tough job. I'm trying to find all nodes with two-way relationships starting from a specific node. Based on the image above, I would like to find all two-way relationships starting from node 1. Only nodes with two-way relationships match. For example, node 1,3,4 matches and node 1,2,3 matches as two separate groups. However, if node 2 and 4 has a two-way relationship, then node 1,2,3,4 matches as one group. The main idea is that all nodes are linked both ways in such a group. My idea is to find all nodes with two-way relationships starting from 1 and continue processing, but I'm not able to continue. Can anyone help me with this problem, thanks a lot. By the way, only the largest 'two-way-circle' is needed.
Your problem looks a lot like finding strongly connected components in the graph. As defined in the docs.
A directed graph is strongly connected if there is a path between all
pairs of vertices ( nodes ). This algorithms treats the graph as directed, so
the direction of the relationship is important and strongly connected
compoment exists only if there are relationships between nodes in both
direction.
Check out more in the documentation. You will need neo4j-graph-algorithms.
Example query with writing back the component of the graph to the node.
CALL algo.scc('Label','C', {write:true,partitionProperty:'partition'})
YIELD loadMillis, computeMillis, writeMillis, setCount, maxSetSize, minSetSize
And then you can find your biggest component with the following query.
MATCH (u:Label)
RETURN distinct(u.partition) as partition,count(*) as size_of_partition
ORDER by size_of_partition DESC LIMIT 1

Fast search for unconnected nodes in big neo4j graph

So, i've created a Neo4j graph database out of a relational database. The graph database has about 7 million nodes, and about 9 million relationships between the nodes.
I now want to find all nodes, that are not connected to nodes with a certain label (let's call them unconnected nodes). For example, i have nodes with the labels "Customer" and "Order" (let's call them top-level-nodes). I want to find all nodes that have no relationship from or to these top-level-nodes. The relationship doesn't have to be direct, the nodes can be connected via other nodes to the top-level-nodes.
I have a cypher query which would solve this problem:
MATCH (a) WHERE not ((a)-[*]-(:Customer)) AND not ((a)-[*]-(:Order)) RETURN a;
As you can imagine, the query will need a long time to execute, the performance is bad. Most likely because of the undirected relationship and because it doesn't matter via how many nodes the relationship can be made. However, the relationship directions don't matter, and i need to make sure that there is no path from any node to one of the top-level-nodes.
Is there any way to find the unconnected nodes faster ? Note that the database is really big, and there are more than 2 labels which mark top-level-nodes.
You could try this approach, which does involve more operations, but can be run in batches for better performance (see apoc.periodic.commit() in the APOC procedures library).
The idea is to first apply a label (say, :Unconnected) to all nodes in your graph (batch execute with apoc.periodic.commit), and then, taking batches of top level nodes with that label, matching to all nodes in the subgraphs extending from them and removing that label.
When you finally have run out of top level nodes with the :Unconnected label (meaning all top level nodes and their subgraphs no longer have this label) then the only nodes remaining in your graph with the :Unconnected label are not connected to your top level nodes.
Any approach to this kind of operation will likely be slow, but the advantage again is that you can process this in batches, and if you get interrupted, you can resume. Once your queries are done, all the relevant unconnected nodes are now labeled for further processing at your convenience.
Also, one last note, in Neo4j undirected relationships have no arrows in the syntax ()-[*]-().
MATCH (a)
WHERE
not (a:Customer OR a:Order)
AND shortestPath((a)-[*]-(:Customer)) IS NULL
AND shortestPath((a)-[*]-(:Order)) IS NULL
RETURN a;
If you could add rel-types it would be faster.
One further optimization could be to check the nodes of an :Customer path for an :Order node and vice versa. i.e.
NONE(n in nodes(path) WHERE n:Order)
In general, this might be rather a set operation, i.e.
expand around all order and customer nodes in parallel into two sets
and compute the overlap between the two sets.
Then remove the overlap from the total number of nodes.
I added an issue for apoc here to add such a function or procedure
https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/223

How to find all nodes of a certain label, no matter how far from the starting Node

I have a Node (Me).
(Me) has a name="john smith" attribute and is a 'Person' label
This node has many relationships with other nodes of different 'labels'. And those nodes have relationships with other nodes. etc.. stretching out into the whole graph.
Somewhere out there are some nodes of type (p:Product). I don't know at what 'depth' they are away from the starting node: some will be near it and others will be further away.
I want to get all these (p:Product) nodes which have ultimate relations with the (Me) node.
Is this possible?
You can get a node in a specified depth using cypher:
start me = node(your me node)
match (me)-[*..100]-(p:Product) return p
This will find the closest Product node with a maximum relation depth set to 100.
You can find more on documentation website

Singly connected Graph?

A singly connected graph is a directed graph which has at most 1 path from u to v ∀ u,v.
I have thought of the following solution:
Run DFS from any vertex.
Now run DFS again but this time starting from the vertices in order of decreasing finish time. Run this DFS only for vertices which are not visited in some previous DFS. If we find a cross edge in the same component or a forward edge, then it is not Singly connected.
If all vertices are finished and no such cross of forward edges, then singly connected.
O(V+E)
Is this right? Or is there a better solution.
Update : atmost 1 simple path.
A graph is not singly connected if one of the two following conditions satisfies:
In the same component, when you do the DFS, you get a road from a vertex to another vertex that has already finished it's search (when it is marked BLACK)
When a node points to >=2 vertices from another component, if the 2 vertices have a connection then it is not singly connected. But this would require you to keep a depth-first forest.
A singly connected component is any directed graph belonging to the same entity.
It may not necessarily be a DAG and can contain a mixture of cycles.
Every node has atleast some link(in-coming or out-going) with atleast one node for every node in the same component.
All we need to do is to check whether such a link exists for the same component.
Singly Connected Component could be computed as follows:
Convert the graph into its undirected equivalent
Run DFS and set the common leader of each node
Run an iteration over all nodes.
If all the nodes have the same common leader, the undirected version of the graph is singly connected.
Else, it contains of multiple singly connected subgraphs represented by their corresponding leaders.
Is this right?
No, it's not right. Considering the following graph which is not singly connected. The first component comes from a dfs beginning with vertex b and the second component comes from a dfs beginning with vertex a.
The right one:
Do the DFS, the graph is singly connected if all of the three following conditions satisfies:
no foward edges
no cross edges in the same component
there is no more than 1 cross edges between any two of components

Resources