Are digraph atomicity and isolation guarantees described anywhere?
Especially:
What state will another process see digraph in, if another process tries to access it (vertices(), out_neighbours() etc) in the middle of del_vertex: before del_vertex, in the middle of del_vertex (i. e. vertex is deleted, edges are not or edges are deleted, vertex is not) or after del_vertex (i. e. another process will be blocked until operation is over)?
The same question regarding del_vertices.
If I understand right, digraph is implemented using 3 ets tables. Is there any additional locking mechanism between them in order results to be consistent?
Looking at the source of digraph.erl I can see no extra locking going on.
del_vertex(G, V) ->
do_del_vertex(V, G).
...
do_del_vertex(V, G) ->
do_del_nedges(ets:lookup(G#digraph.ntab, {in, V}), G),
do_del_nedges(ets:lookup(G#digraph.ntab, {out, V}), G),
ets:delete(G#digraph.vtab, V).
So when you look at the digraph from another process you'll see the following states depending on timing:
Everything before the del_vertex/2
Some edges to and from the vertex deleted
The vertex itself deleted
The same happens vertex after vertex for del_vertices/2.
If you want more atomicity create the digraph protected and wrap it into its own server e.g. a gen_server usually implementing part of the functionality that needs close access to the digraph.
Related
I encountered many code fragments like the following for choosing an action, that include a mix of torch.no_grad and detach (where actor is some actor, SomeDistribution your preferred distribution), and I'm wondering whether they make sense:
def f():
with torch.no_grad():
x = actor(observation)
dist = SomeDistribution(x)
sample = dist.sample()
return sample.detach()
Is the use of detach in the return statement not unnecessary, as x has its requires_grad already set to False, so all computations using x should already be detached from the graph? Or do the computations after the torch.no_grad wrapper somehow end up on the graph again, so we need to detach them once again in the end (in which case it seems to me that no_grad would be unnecessary)?
Also, if I'm right, I suppose instead of omitting detach one could also omit torch.no_grad, and end up with the same functionality, but worse performance, so torch.no_grad is to be preferred?
While it may be redundant, it depends on the internals of actor and SomeDistribution. In general, there are three cases I can think of where detach would be necessary in this code. Since you've already observed that x has requires_grad set to False then cases 2 and 3 don't apply to your specific case.
If SomeDistribution has internal parameters (leaf tensors with requires_grad=True) then dist.sample() may result in a computation graph connecting sample to those parameters. Without detaching, that computation graph, including those parameters, would be unnecessarily kept in memory after returning.
The default behavior within a torch.no_grad context is to return the result of tensor operations having requires_grad set to False. However, if actor(observation) for some reason explicitly sets requires_grad of its return value to True before returning, then a computation graph may be created that connects x to sample. Without detaching, that computation graph, including x, would be unnecessarily kept in memory after returning.
This one seems even more unlikely, but if actor(observation) actually just returns a reference to observation, and observation.requires_grad is True, then a computation graph all the way from observation to sample may be constructed during dist.sample().
As for the suggestion of removing the no_grad context in leu of detach, this may result in the construction of a computation graph connecting observation (if it requires gradients) and/or the parameters of the distribution (if it has any) to x. The graph would be discarded after detach, but it does take time and memory to create the computation graph, so there may be a performance penalty.
In conclusion, it's safer to do both no_grad and detach, though the necessity of either depends on the details of the distribution and actor.
The .split_off method on std::collections::LinkedList is described as having a O(n) time complexity. From the (docs):
pub fn split_off(&mut self, at: usize) -> LinkedList<T>
Splits the list into two at the given index. Returns everything after the given index, including the index.
This operation should compute in O(n) time.
Why not O(1)?
I know that linked lists are not trivial in Rust. There are several resources going into the how's and why's like this book and this article among several others, but I haven't got the chance to dive into those or the standard library's source code yet.
Is there a concise explanation about the extra work needed when splitting a linked list in (safe) Rust?
Is this the only way? And if not why was this implementation chosen?
The method LinkedList::split_off(&mut self, at: usize) first has to traverse the list from the start (or the end) to the position at, which takes O(min(at, n - at)) time. The actual split off is a constant time operation (as you said). And since this min() expression is confusing, we just replace it by n which is legal. Thus: O(n).
Why was the method designed like that? The problem goes deeper than this particular method: most of the LinkedList API in the standard library is not really useful.
Due to its cache unfriendliness, a linked list is often a bad choice to store sequential data. But linked lists have a few nice properties which make them the best data structure for a few, rare situations. These nice properties include:
Inserting an element in the middle in O(1), if you already have a pointer to that position
Removing an element from the middle in O(1), if you already have a pointer to that position
Splitting the list into two lists at an arbitrary position in O(1), if you already have a pointer to that position
Notice anything? The linked list is designed for situations where you already have a pointer to the position that you want to do stuff at.
Rust's LinkedList, like many others, just store a pointer to the start and end. To have a pointer to an element inside the linked list, you need something like an Iterator. In our case, that's IterMut. An iterator over a collection can function like a pointer to a specific element and can be advanced carefully (i.e. not with a for loop). And in fact, there is IterMut::insert_next which allows you to insert an element in the middle of the list in O(1). Hurray!
But this method is unstable. And methods to remove the current element or to split the list off at that position are missing. Why? Because of the vicious circle that is:
LinkedList lacks almost all features that make linked lists useful at all
Thus (nearly) everyone recommends not to use it
Thus (nearly) no one uses LinkedList
Thus (nearly) no one cares about improving it
Goto 1
Please note that are a few brave souls occasionally trying to improve the situations. There is the tracking issue about insert_next, where people argue that Iterator might be the wrong concept to perform these O(1) operations and that we want something like a "cursor" instead. And here someone suggested a bunch of methods to be added to IterMut (including cut!).
Now someone just has to write a nice RFC and someone needs to implement it. Maybe then LinkedList won't be nearly useless anymore.
Edit 2018-10-25: someone did write an RFC. Let's hope for the best!
Edit 2019-02-21: the RFC was accepted! Tracking issue.
Maybe I'm misunderstanding your question, but in a linked list, the links of each node have to be followed to proceed to the next node. If you want to get to the third node, you start at the first, follow its link to the second, then finally arrive at the third.
This traversal's complexity is proportional to the target node index n because n nodes are processed/traversed, so it's a linear O(n) operation, not a constant time O(1) operation. The part where the list is "split off" is of course constant time, but the overall split operation's complexity is dominated by the dominant term O(n) incurred by getting to the split-off point node before the split can even be made.
One way in which it could be O(1) would be if a pointer existed to the node after which the list is split off, but that is different from specifying a target node index. Alternatively, an index could be kept mapping the node index to the corresponding node pointer, but it would be extra space and processing overhead in keeping the index updated in sync with list operations.
pub fn split_off(&mut self, at: usize) -> LinkedList<T>
Splits the list into two at the given index. Returns everything after the given index, including the index.
This operation should compute in O(n) time.
The documentation is either:
unclear, if n is supposed to be the index,
pessimistic, if n is supposed to be the length of the list (the usual meaning).
The proper complexity, as can be seen in the implementation, is O(min(at, n - at)) (whichever is smaller). Since at must be smaller than n, the documentation is correct that O(n) is a bound on the complexity (reached for at = n / 2), however such a large bound is unhelpful.
That is, the fact that list.split_off(5) takes the same time if list.len() is 10 or 1,000,000 is quite important!
As to why this complexity, this is an inherent consequence of the structure of doubly-linked list. There is no O(1) indexing operation in a linked-list, after all. The operation implemented in C, C++, C#, D, F#, ... would have the exact same complexity.
Note: I encourage you to write a pseudo-code implementation of a linked-list with the split_off operation; you'll realize this is the best you can get without altering the data-structure to be something else.
This question bothers me for a long time. The basic vehicle counting program includes: 1. recognize a vehicle. 2. track the vehicle by features.
However, if the vehicle #1 was found at time t, then at t+1 the program start to track the vehicle, but #1 can also be found by recognizing process, then t+2 program two vehicles will be tracked, but actually just one #1 in the frame. How can the recognized vehicle avoiding duplicate detect?
Thanks in advance!
If I understood correctly, you are concerned about detecting the object that you are already tracking (lack of detector/tracker communication). In that case you can either:
Pre-check - during detection exclude the areas, where you already track objects or
Post-check - discard detected objects, that are near tracked ones (if "selective" detection is not possible for your approach for some reason)
There are several possible implementations.
Mask. Create a binary mask, where areas near tracked objects are "marked" (e.g. ones near tracked objects and zeros everywhere else). Given such a mask, before detection in particular location you can quickly check if something is being tracked there, and abort detection (Pre-check approach) or remove detected object, if you stick with the Post-check approach.
Brute-force. Calculate distances between particular location and each of the tracked ones (you can also check overlapping area and other characteristics). You can then discard detections, that are too close and/or similar to already tracked objects.
Lets consider which way is better (and when).
Mask needs O(N) operations to add all tracked objects to the mask and O(M) operations to check all locations of interest. That's O(N + M) = O(max(N, M)), where N is number of tracked objects and M is number of checked locations (detected objects, for example). Which number (N or M) will be bigger depends on your application. Additional memory is also needed to hold the binary mask (usually it is not very important, but again, it depends on the application).
Brute-force needs O(N * M) operations (each of M locations is checked against N candidates). It doesn't need additional memory, and allows doing more complex logic during checks. For example, if object suddenly changes size/color/whatever within one frame - we should probably not track it (since it may be a completely different object occluding original one) and do something else instead.
To sum up:
Mask is asymptotically better when you have a lot of objects. It is almost essential if you do something like a sliding window search during detection, and can exclude some areas (since in this case you will likely have a large M). You will likely use it with Pre-check.
Brute-force is OK when you have few objects and need to do checks that involve different properties. It makes most sense to use it with Post-check.
If you happen to need something inbetween - you'll have to be more creative and either encode object properties in mask somehow (to achieve constant look-up time) or use more complex data structures (to speed up "Brute-force" search).
I am trying to register a couple processess with atom names created dynamically, like so:
keep_alive(Name, Fun) ->
register(Name, Pid = spawn(Fun)),
on_exit(Pid, fun(_Why) -> keep_alive(Name, Fun) end).
monitor_some_processes(N) ->
%% create N processes that restart automatically when killed
for(1, N, fun(I) ->
Mesg = io_lib:format("I'm process ~p~n", [I]),
Name = list_to_atom(io_lib:format("zombie~p", [I])),
keep_alive(Name, fun() -> zombie(Mesg) end)
end).
for(N, N, Fun) -> [Fun(N)];
for(I, N, Fun) -> [Fun(I)|for(I+1, N, Fun)].
zombie(Mesg) ->
io:format(Mesg),
timer:sleep(3000),
zombie(Mesg).
That list_to_atom/1 call though is resulting in an error:
43> list_to_atom(io_lib:format("zombie~p", [1])).
** exception error: bad argument
in function list_to_atom/1
called as list_to_atom([122,111,109,98,105,101,"1"])
What am I doing wrong?
Also, is there a better way of doing this?
TL;DR
You should not dynamically generate atoms. From what your code snippet indicates you are probably trying to find some way to flexibly name processes, but atoms are not it. Use a K/V store of some type instead of register/2.
Discussion
Atoms are restrictive for a reason. They should represent something about the eternal structure of your program, not the current state of it. Atoms are so restrictive that I imagine what you really want to be able to do is register a process using any arbitrary Erlang value, not just atoms, and reference them more freely.
If that is the case, pick from one of the following four approaches:
Keep Key/Value pairs somewhere to act as your own registry. This could be a separate process or a list/tree/dict/map handler to store key/value pairs of #{Name => Pid}.
Use the global module (which, like gproc below, has features that work across a cluster).
Use a registry solution like Ulf Wiger's nice little project gproc. It is awesome for the times when you actually need it (which are, honestly, not as often as I see it used). Here is a decent blog post about its use and why it works the way it does: http://blog.rusty.io/2009/09/16/g-proc-erlang-global-process-registry/. An added advantage of gproc is that nearly every Erlanger you'll meet is at least passingly familiar with it.
A variant on the first option, structure your program as a tree of service managers and workers (as in the "Service -> Worker Pattern"). A side effect of this pattern is that very often the service manager winds up needing to monitor its process for one reason or another if you're doing anything non-trivial, and that makes it an ideal candidate for a place to keep a Key/Value registry of Pids. It is quite common for this sort of pattern to wind up emerging naturally as a program matures, especially if that program has high robustness requirements. Structuring it as a set of semi-independent services with an abstract management interface at the top of each from the outset is often a handy evolutionary shortcut.
io_lib:format returns a potentially "deep list" (i.e. it may contain other lists), while list_to_atom requires a "flat list". You can wrap the io_lib:format call in a call to lists:flatten:
list_to_atom(lists:flatten(io_lib:format("zombie~p", [1]))).
I am trying to implement a modified parallel depth-first search algorithm in Erlang (let's call it *dfs_mod*).
All I want to get is all the 'dead-end paths' which are basically the paths that are returned when *dfs_mod* visits a vertex without neighbours or a vertex with neighbours which were already visited. I save each path to ets_table1 if my custom function fun1(Path) returns true and to ets_table2 if fun1(Path) returns false(I need to filter the resulting 'dead-end' paths with some customer filter).
I have implemented a sequential version of this algorithm and for some strange reason it performs better than the parallel one.
The idea behind the parallel implementation is simple:
visit a Vertex from [Vertex|Other_vertices] = Unvisited_neighbours,
add this Vertex to the current path;
send {self(), wait} to the 'collector' process;
run *dfs_mod* for Unvisited_neighbours of the current Vertex in a new process;
continue running *dfs_mod* with the rest of the provided vertices (Other_vertices);
when there are no more vertices to visit - send {self(), done} to the collector process and terminate;
So, basically each time I visit a vertex with unvisited neighbours I spawn a new depth-first search process and then continue with the other vertices.
Right after spawning a first *dfs_mod* process I start to collect all {Pid, wait} and {Pid, done} messages (wait message is to keep the collector waiting for all the done messages). In N milliseconds after waiting the collector function returns ok.
For some reason, this parallel implementation runs from 8 to 160 seconds while the sequential version runs just 4 seconds (the testing was done on a fully-connected digraph with 5 vertices on a machine with Intel i5 processor).
Here are my thoughts on such a poor performance:
I pass the digraph Graph to each new process which runs *dfs_mod*. Maybe doing digraph:out_neighbours(Graph) against one digraph from many processes causes this slowness?
I accumulate the current path in a list and pass it to each new spawned *dfs_mod* process, maybe passing so many lists is the problem?
I use an ETS table to save a path each time I visit a new vertex and add it to the path. The ETS properties are ([bag, public,{write_concurrency, true}), but maybe I am doing something wrong?
each time I visit a new vertex and add it to the path, I check a path with a custom function fun1() (it basically checks if the path has vertices labeled with letter "n" occurring before vertices with "m" and returns true/false depending on the result). Maybe this fun1() slows things down?
I have tried to run *dfs_mod* without collecting done and wait messages, but htop shows a lot of Erlang activity for quite a long time after *dfs_mod* returns ok in the shell, so I do not think that the active message passing slows things down.
How can I make my parallel dfs_mod run faster than its sequential counterpart?
Edit: when I run the parallel *dfs_mod*, pman shows no processes at all, although htop shows that all 4 CPU threads are busy.
There is no quick way to know without the code, but here's a quick list of why this might fail:
You might be confusing parallelism and concurrency. Erlang's model is shared-nothing and aims for concurrency first (running distinct units of code independently). Parallelism is only an optimization of this (running some of the units of code at the same time). Usually, parallelism will take form at a higher level, say you want to run your sorting function on 50 different structures -- you then decide to run 50 of the sequential sort functions.
You might have synchronization problems or sequential bottlenecks, effectively changing your parallel solution into a sequential one.
The overhead of copying data, context switching and whatnot dwarfs the gains you have in terms of parallelism. This former is especially true of large data sets that you break into sub data sets, then join back into a large one. The latter is especially true of highly sequential code, as seen is the process ring benchmarks.
If I wanted to optimize this, I would try to reduce message passing and data copying to a minimum.
If I were the one working on this, I would keep the sequential version. It does what it says it should do, and when part of a larger system, as soon as you have more processes than core, parallelism will come from the many calls to the sort function rather than branches of the sort function. In the long run, if part of a server or service, using the sequential version N times should have no more negative impact than a parallel one that ends up creating many, many more processes to do the same task, and risk overloading the system more.