I am applying DFS on a graph and maintaining state (discovered/Not discovered , Processed/Not Processed) of each node. When I put a new node in DFS stack its state is discovered but not processed. And When a node is removed from Dfs stack its state is discovered and processed.
Let's say I am visiting a new node Y from X in DFS and this is my condition for detecting cycle.
if(discovered(y) && !processed(y))
Is this condition correct for directed graphs?
Yes, I belive it is. My reasoning would be to examine possible states of the node like this:
y is not discovered (and not processed) - we were never here -> not a cycle
y is discovered but not processed - we are in the process of processing this node, therefore, we are in its descendant -> we found a cycle
y is discovered but processed - we already processed y and all paths originated from it, so we can't be in its descendat -> not a cycle
Related
I have a process that lots of threads make links to in order to associate themselves with this process. I made the system this way so that at any time I can easily see who is associated to the central process without having to also keep track of any lists inside of the application itself. I can simply do process_info(self(),links) and erlang keeps track of which processes are still alive or not, etc....
At least, I thought so, until I found out that the list being returned on this thread isn't accurate at this moment:
% P is my central pid
(node#host)212> P.
<0.803.0>
% Get all the links to it
(node#host)213> {links,L} = process_info(P,links).
{links,[<0.29179.155>,<0.6492.250>,<0.29990.293>|...]}
% Counting out all the links
(node#host)214> length(L).
31154
% Filtering out all of the dead processes, put them in Lf
(node#host)215> Lf = lists:filter(fun(Pid) -> try process_info(Pid,links) of {links,_} -> true; _ -> false catch _:_ -> false end end, L).
[<0.29179.155>,<0.6492.250>,<0.29990.293>,<0.23619.530>|...]
% Lf is only about half the size, half the linked processes are dead!
(node#host)216> length(Lf).
15654
% Proof that the links haven't changed in the interim
(node#host)217> {links,L} = process_info(P,links).
{links,[<0.29179.155>,<0.6492.250>,<0.29990.293>|...]}
The only thing I can think of that would cause this is network connectivity issues, since some of the links may come from threads on a node on another machine. Is that a possible explanation for this? And is there a way to make the thread clean up its link list?
This was my fault. It's actually a problem with how I was checking if processes are dead. process_info doesn't work on threads on remote nodes. D'oh!
Detecting cycles in a single linked list is a well known problem. I know that this question has been asked a zillion times all over the internet. The reason why I am asking it again is I thought of a solution which I did not encounter at other places. (I admit I haven't searched that deeply either).
My solution is:
Given a linked list and pointer to some node, break the link between node and node->next();
Then start at node->next() and traverse till either you hit an end (which means there was no loop) or till you reach at node which means there was a loop.
Is there anything wrong/good about above solution ?
Note: Do join the link back once you are done.
That will work to detect complete cycles (i.e., cycles with a period of the whole list), e.g.:
A -> B -> C -> D -> A
But what if we have a cycle somewhere else in the list?
e.g.,
A -> B -> C -> D -> E -> C
I can't see that your algorithm will detect the cycle in this case.
Keep in mind that to detect the first case, we need not even break the link. We could just traverse the list and keep comparing the next link for each node with the head element to see if we'd started back at the start yet (or hit the end).
I guess the most trivial approach (not necessarily the best, but one that everybody should know how to implement in Java in a few lines of code) is to build a Hash Set of the nodes, start adding them until you find one that you already saw before. Takes extra memory though.
If you can mark nodes, start marking them until you find one you marked before (the hash map is essentially an external marker).
And check the usual graph theory books...
You are not allowed to break a link, even if you join it back at the end. What if other programs read the list at the same time ?
The algorithm must not damage the list while working on it.
Using a gremlin script and neo4j I try to find all paths between two nodes, descending at most 10 levels down. But all I get as response from the REST API is a
java.lang.ArrayIndexOutOfBoundsException: -1
Here is the script:
x = g.v(2)
y = g.v(6)
x.both.loop(10){!it.object.equals(y)}.paths
I looked through the documentation, but couldnt find anything relevant for this usecase.
In Gremlin the argument to loop is the number of steps back that you wish to go and the closure is evaluated to determine when to break out of the loop. In this case, because you have loop(10) it's going to go back way too far to a point where the pipeline is not defined. With the respect to the closure, you'll need to check not only if the object is the one in question, in which case you should stop, but also whether or not you've done 10 loops already.
What you really want it something like this:
x.both.loop(1){!it.object.equals(y) && it.loops < 10}.paths
However, I should add that if there is a cycle in the graph, this will gladly traverse the cycle over and over and result in far too many paths. You can apply some clever filter and sideEffect to avoid visiting nodes multiple times.
For more information see the Loop Pattern Page on the Gremlin Wiki.
Are digraph atomicity and isolation guarantees described anywhere?
Especially:
What state will another process see digraph in, if another process tries to access it (vertices(), out_neighbours() etc) in the middle of del_vertex: before del_vertex, in the middle of del_vertex (i. e. vertex is deleted, edges are not or edges are deleted, vertex is not) or after del_vertex (i. e. another process will be blocked until operation is over)?
The same question regarding del_vertices.
If I understand right, digraph is implemented using 3 ets tables. Is there any additional locking mechanism between them in order results to be consistent?
Looking at the source of digraph.erl I can see no extra locking going on.
del_vertex(G, V) ->
do_del_vertex(V, G).
...
do_del_vertex(V, G) ->
do_del_nedges(ets:lookup(G#digraph.ntab, {in, V}), G),
do_del_nedges(ets:lookup(G#digraph.ntab, {out, V}), G),
ets:delete(G#digraph.vtab, V).
So when you look at the digraph from another process you'll see the following states depending on timing:
Everything before the del_vertex/2
Some edges to and from the vertex deleted
The vertex itself deleted
The same happens vertex after vertex for del_vertices/2.
If you want more atomicity create the digraph protected and wrap it into its own server e.g. a gen_server usually implementing part of the functionality that needs close access to the digraph.
I always mix up whether I use a stack or a queue for DFS or BFS. Can someone please provide some intuition about how to remember which algorithm uses which data structure?
Queue can be generally thought as horizontal in structure i.e, breadth/width can be attributed to it - BFS, whereas
Stack is visualized as a vertical structure and hence has depth - DFS.
Draw a small graph on a piece of paper and think about the order in which nodes are processed in each implementation. How does the order in which you encounter the nodes and the order in which you process the nodes differ between the searches?
One of them uses a stack (depth-first) and the other uses a queue (breadth-first) (for non-recursive implementations, at least).
I remember it by keeping Barbecue in my mind. Barbecue starts with a 'B' and ends with a sound like 'q' hence BFS -> Queue and the remaining ones DFS -> stack.
BFS explores/processes the closest vertices first and then moves outwards away from the source. Given this, you want to use a data structure that when queried gives you the oldest element, based on the order they were inserted. A queue is what you need in this case since it is first-in-first-out(FIFO).
Whereas a DFS explores as far as possible along each branch first and then bracktracks. For this, a stack works better since it is LIFO(last-in-first-out)
Take it in Alphabetical order...
.... B(BFS).....C......D (DFS)....
.... Q(Queue)...R......S (Stack)...
BFS uses always queue, Dfs uses Stack data structure. As the earlier explanation tell about DFS is using backtracking. Remember backtracking can proceed only by Stack.
BFS --> B --> Barbecue --> Queue
DFS --> S --> Stack
Don't remember anything.
Assuming the data structure used for the search is X:
Breadth First = Nodes entered X earlier, have to be generated on the tree first: X is a queue.
Depth First = Nodes entered X later, must be generated on the tree first: X is a stack.
In brief: Stack is Last-In-First-Out, which is DFS. Queue is First-In-First-Out, which is BFS.
Bfs;Breadth=>queue
Dfs;Depth=>stack
Refer to their structure
The depth-first search uses a Stack to remember where it should go when it reaches a dead end.
DFSS
Stack (Last In First Out, LIFO). For DFS, we retrieve it from root to the farthest node as much as possible, this is the same idea as LIFO.
Queue (First In First Out, FIFO). For BFS, we retrieve it one level by one leve, after we visit upper level of the node, we visit bottom level of node, this is the same idea as FIFO.
An easier way to remember, especially for young students, is to use similar acronym:
BFS => Boy FriendS in queue (for popular ladies apparently).
DFS is otherwise (stack).
I would like to share this answer:
https://stackoverflow.com/a/20429574/3221630
Taking BFS and replacing a the queue with a stack, reproduces the same visiting order of DFS, it uses more space than the actual DFS algorithm.
You can remember by making an acronym
BQDS
Beautiful Queen has Done Sins.
In Hindi,
बहुरानी क्यु दर्द सहा
Here is a simple analogy to remember. In a BFS, you are going one level at a time but in DFS you are going as deep as possible to the left before visiting others. Basically stacking up a big pile of stuff, then analyze them one by one, so if this is STACK, then the other one is queue.
Remember as "piling up", "stacking up", big as possible. (DFS).
if you visually rotate 'q' symbol (as in queue) 180 degrees you will get a 'b' (as in bfs).
Otherwise this is stack and dfs.