How BFS can from a tree on certain directed graph? - breadth-first-search

It is said said BFS always gives a tree, while DFS gives a forest.
But I do not understand how BFS can always give a tree.
Consider this graph and starting point b
How do we get a tree here?

Don't understand why you have a directed graph and an terminal starting point b.
From b, it can go nowhere but stay at b.
If it not directed, then it will be b->a->c->d, no matter it is BFS or DFS.
First time heard DFS returns a forest. Guess people think this because every time it reaches end it will return to parent node.

A tree is basically a connected graph(at least one path between every pair of nodes) with no cycles.
If we do BFS on a connected graph, we will visit every node in the graph and each node is visited only once. So, there is only one path from starting node to every node we visit, since we visit it only once. There is also only one path between any pair of nodes by same argument as above (it will make more sense if you imagine a graph and comprehend). So, there are no cycles and hence it's a tree.

BFS not gives a tree. It use queue ,in fact. And alongside with DFS, BFS is also an algorithm to go through the tree.

Related

Cypher - unlimited path length and large path length queries hang

I am using Neo4j Community 4.0.4.
I have encountered this issue using the offical Bolt driver for Python, but it is also completely reproducible in the Neo4j browser (version 4.0.7).
I have a very simple graph for now, consisting of the following node and relationship types:
(:Document)-[:contains]->(:Block)
(:Block)<-[:prev]-(:Block)-[:next]->(:Block)
There are only 75 nodes in my entire test database for now - 1 Document node and 74 Block nodes.
Running the following Cypher statement brings the CPU to 100% and the memory utilization rises indefinitely, after which I have to kill the session:
match (d:Doc{name: 'doc name'})
optional match (d)-[*]-(n)
return d,n
I also got the Java heap size error at some point.
It only starts to work if I set a strict upper bound on the relationship or specify the direction, e.g.:
optional match (d)-[*..5]->(n)
For example, this already does not work (the answer takes forever so I have to kill the session):
optional match (d)-[*..5]-(n)
Considering that (a) I am doing a strictly local graph traversal that graph databases are supposed to be exceptionally good at, (b) clusters associated with different starting nodes are NOT connected and (c) my test data set is tiny, how can this be happening?
From the symptoms it appears that the engine simply does not keep track of which nodes and relationships were already visited when preparing the results ... or am I missing something?
UPDATE:
This was just answered via the Neo4j community forum by a Neo4j staff member:
https://community.neo4j.com/t/getting-paths-of-any-length-or-long-paths-does-not-work/18298
I wrongly assumed that Cypher would just dynamically switch from the path uniqueness traversal to the node uniqueness traversal just because the operation following the match dealt only with nodes and not with relationships.
Poor assumption on my part - not only Cypher doesn't do it automatically, there is no way AT ALL in core Cypher to drop a path during traversal if all the nodes in the path were aleady visited.
The APOC-based solution was suggested:
match (d:Doc{name: 'doc name'})
CALL apoc.path.subgraphNodes(d, {}) YIELD node as n
return d, n
In my case I have disconnected sub-graphs that are tens of thousands of nodes each and are relatively dense. This came up when trying to delete a (:Doc) node and everything that's connected to it before re-loading a new version of the sub-graph into Neo4j:
disconnect delete d, n
I see this task of "removing the old version before re-loading" as a very common operational task for sub-graphs that many people may have in their use cases... Installing and managing additional libraries (like APOC or the Graph Data Science library) seems like an overkill for something this simple... But it's either that or making the deletions more targeted.
A MATCH clause avoids traversing the same relationship twice, so that would not be the issue. However, it can still travel between the same 2 nodes multiple times (as long as different relationships are used).
The main thing to consider is that variable-length relationship patterns have exponential (time and memory) complexity. If the nodes being traversed have an average of R relevant relationships, then the MATCH clause has to traverse about R**P possible paths of length P. The higher that P gets (especially with no upper bound), the worse it gets. But a high R also hurts.

Neo4j labels and properties, and their differences

Say we have a Neo4j database with several 50,000 node subgraphs. Each subgraph has a root. I want to find all nodes in one subgraph.
One way would be to recursively walk the tree. It works but can be thousands of trips to the database.
One way is to add a subgraph identifier to each node:
MATCH(n {subgraph_id:{my_graph_id}}) return n
Another way would be to relate each node in a subgraph to the subgraph's root:
MATCH(n)-[]->(root:ROOT {id: {my_graph_id}}) return n
This feels more "graphy" if that matters. Seems expensive.
Or, I could add a label to each node. If {my_graph_id} was "BOBS_QA_COPY" then
MATCH(n:BOBS_QA_COPY) return n
would scoop up all the nodes in the subgraph.
My question is when is it appropriate to use a garden-variety property, add relationships, or set a label?
Setting a label to identify a particular subgraph makes me feel weird, like I am abusing the tool. I expect labels to say what something is, not which instance of something it is.
For example, if we were graphing car information, I could see having parts labeled "FORD EXPLORER". But I am less sure that it would make sense to have parts labeled "TONYS FORD EXPLORER". Now, I could see (USER id:"Tony") having a relationship to a FORD EXPLORER graph...
I may be having a bout of "SQL brain"...
Let's work this through, step by step.
If there are N non-root nodes, adding an extra N ROOT relationships makes the least sense. It is very expensive in storage, it will pollute the data model with relationships that don't need to be there and that can unnecessarily complicate queries that want to traverse paths, and it is not the fastest way to find all the nodes in a subgraph.
Adding a subgraph ID property to every node is also expensive in storage (but less so), and would require either: (a) scanning every node to find all the nodes with a specific ID (slow), or (b) using an index, say, :Node(subgraph_id) (faster). Approach (b), which is preferable, would also require that all the nodes have the same Node label.
But wait, if approach 2(b) already requires all nodes to be labelled, why don't we just use a different label for each subgroup? By doing that, we don't need the subgraph_id property at all, and we don't need an index either! And finding all the nodes with the same label is fast.
Thus, using a per-subgroup label would be the best option.

Topological sort of DAG

I have a dag (Tree) in which the directed edges are only of three kinds :
Left to right (siblings)
Child to Parent
Parent to a child
Specifically , the problem is to evaluate an attribute parse tree, but it doesn't matter what the specific problem is.
Sort of :
What traversal is guaranteed to give a topological sort of the nodes ?
I think inorder will fail but some places it is suggested that inorder is the way to go. I know reverse post order woeks on general DAGS but I think there must be a simpler traversal for my case.
Since your graph is a DAG, and you therefore have no back-edges, you can use depth-first search to traverse your graph and add nodes to your sorted list in the order in which they come off the DFS stack.

Singly connected Graph?

A singly connected graph is a directed graph which has at most 1 path from u to v ∀ u,v.
I have thought of the following solution:
Run DFS from any vertex.
Now run DFS again but this time starting from the vertices in order of decreasing finish time. Run this DFS only for vertices which are not visited in some previous DFS. If we find a cross edge in the same component or a forward edge, then it is not Singly connected.
If all vertices are finished and no such cross of forward edges, then singly connected.
O(V+E)
Is this right? Or is there a better solution.
Update : atmost 1 simple path.
A graph is not singly connected if one of the two following conditions satisfies:
In the same component, when you do the DFS, you get a road from a vertex to another vertex that has already finished it's search (when it is marked BLACK)
When a node points to >=2 vertices from another component, if the 2 vertices have a connection then it is not singly connected. But this would require you to keep a depth-first forest.
A singly connected component is any directed graph belonging to the same entity.
It may not necessarily be a DAG and can contain a mixture of cycles.
Every node has atleast some link(in-coming or out-going) with atleast one node for every node in the same component.
All we need to do is to check whether such a link exists for the same component.
Singly Connected Component could be computed as follows:
Convert the graph into its undirected equivalent
Run DFS and set the common leader of each node
Run an iteration over all nodes.
If all the nodes have the same common leader, the undirected version of the graph is singly connected.
Else, it contains of multiple singly connected subgraphs represented by their corresponding leaders.
Is this right?
No, it's not right. Considering the following graph which is not singly connected. The first component comes from a dfs beginning with vertex b and the second component comes from a dfs beginning with vertex a.
The right one:
Do the DFS, the graph is singly connected if all of the three following conditions satisfies:
no foward edges
no cross edges in the same component
there is no more than 1 cross edges between any two of components

Cheapest cost traversal on Complete graph

I was wondering if there is an algorithm which:
given a fully connected graph of n-nodes (with different weights)... will give me the cheapest cycle to go from node A (a start node) to all other nodes, and return to node A? Is there a way to alter an algorithm like Primm's to accomplish this?
Thanks for your help
EDIT: I forgot to mention I'm dealing with a undirected graph so the in-degree = out-degree for each vertex.
Can you not modify Dijkstra, to find you the shortest path to all other nodes, and then when you have found it, the shortest path back to A?
You can try the iterative deepening A star search algorithm. It is always optimal. You need to define a heuristic though and this will depend on the problem you are trying to solve.
There need not be any such path. It exists if and only if the in-degree of every node equals its out-degree.
What you want is the cheapest Eulerian path. The problem of finding it is called the Traveling Salesman Problem. There is not, and cannot be, a fast algorithm to solve it.
Edit:
On second thought: The Traveling Salesman Problem searches for a tour that visits every node exactly once. You're asking for a tour that visits every node at least once. Thus, your problem might just be in P. I doubt it, though.

Resources