I'm reading 7 concurrency models in 7 weeks and the author asks to implement
cache that distributes cache entries across multiple actors according
to a hash function. Create a supervisor that starts multiple cache
actors and routes incoming messages to the appropriate cache worker.
What action should this supervisor take if one of the cache workers
fails?
defmodule CacheSupervisor do
def start(number_of_caches) do
spawn(__MODULE__, :init, [number_of_caches])
end
def init(number_of_caches) do
Process.flag(:trap_exit, true)
pids = Enum.into(Enum.map(0..number_of_caches-1, fn(id) ->
{id, Cache.start}
end), %{})
loop(pids, number_of_caches)
end
def loop(pids, number_of_caches) do
receive do
{:put, url, page} -> send(compute_cache_pid(pids, url, number_of_caches), {:put, url, page})
loop(pids, number_of_caches)
{:get, sender, ref, url} -> send(compute_cache_pid(pids, url, number_of_caches), {:get, sender, ref, url})
loop(pids, number_of_caches)
{:EXIT, pid, reason} -> IO.puts("Cache #{pid} failed with reason #{inspect reason} - restarting it")
loop(repair(pids, pid), number_of_caches)
end
end
def restart(pids, pid) do
#not clear how to repair, since we don't know the index of the failed process
end
def compute_cache_pid(pids, url, number_of_caches) do
pids[rem(:erlang.phash2(url), number_of_caches)]
end
end
When a process fails, supervisor only get its pid. Using just pid, I can't understand which bucket needs a new process. I probably could create a second map pid -> bucket_index, but it seems ugly.
Can I somehow attach additional attributes when I spawn a process? Can I read those attributes when process exits?
What would be the best way to solve it (without OTP)?
This implementation also makes supervisor too "smart". As I understand supervisors should be simple and only care about restarting failed processes.
I'm not sure how to separate restarting and routing logic into 2 separate processes: when supervisor restarts some process, router should be aware of it and have up-to-date pids.
Related
Is it possible to register a :poolboy pool in registry (:gproc or Registry in elixir 1.4) after it was started?
I need to implement some kind on pub/sub architecture on pools. And I'd like to register multiple pools under the same alias.
Registry has :duplicate, and :gproc has :p, but it looks like none of them works with :via tuples, so I can't use it in name of my pool.
It looks like Registry only allows registering the current process under a given name.
So to use registry, you'd need to start a process to act as a proxy to each Poolboy pool, eg a GenServer module PoolProxy.
defmodule PoolProxy do
use GenServer
def init(poolname) do
{:ok, _} = Registry.register(Registry.PoolPubSub, "PoolPubSub", nil)
{:ok, poolname}
end
def handle_call(:notify_pool, _from, poolname) do
# interact with poolboy pool here...
end
end
Once registered, you can then pub-sub to the proxy processes, with
Registry.dispatch(Registry.PoolPubSub, "PoolPubSub", fn entries ->
for {pid, _} <- entries, do: GenServer.call(pid, :notify_pool)
end)
Hi I want to implement distributed caches as an exercise. The cache module is based on gen_server. The caches are started by an CacheSupervisor module. At first I tried running it all on one node, which worked well. Now I am trying to distribute my caches on two nodes, which live in two open console windows on my laptop.
PS:
While writing this question I realised that I forgot to connect my third window to the other nodes. I fixed it, but I am still having the original error.
Consoles:
After connecting my nodes I callCacheSupervisor.start_link() in my third window, this results in the follwing error message.
Error:
** (EXIT from #PID<0.112.0>) shutdown: failed to start child: :de
** (EXIT) an exception was raised:
** (ArgumentError) argument error
erlang.erl:2619: :erlang.spawn(:node1#DELL_XPS, {:ok, #PID<0.128.0>})
(stdlib) supervisor.erl:365: :supervisor.do_start_child/2
(stdlib) supervisor.erl:348: :supervisor.start_children/3
(stdlib) supervisor.erl:314: :supervisor.init_children/2
(stdlib) gen_server.erl:328: :gen_server.init_it/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
I am guessing that the error indicates that the :gen_server.start_link(..) inside start_link(name) of my Cache Module resolves to {:ok, #PID<0.128.0>} which seems to be incorrect, but I am having no Idea where to put the Node.spawn() else
Module Cache:
defmodule Cache do
use GenServer
def handle_cast({:put, url, page}, {pages, size}) do
new_pages = Dict.put(pages, url, page)
new_size = size + byte_size(page)
{:noreply, {new_pages, new_size}}
end
def handle_call({:get, url}, _from, {pages, size}) do
{:reply, pages[url], {pages, size}}
end
def handle_call({:size}, _from, {pages, size}) do
{:reply, size, {pages, size}}
end
def start_link(name) do
IO.puts(elem(name,0))
Node.spawn(String.to_atom(elem(name, 0)), :gen_server.start_link({:local,elem(name, 1)}, __MODULE__, {HashDict.new, 0}, []))
end
def put(name, url, page) do
:gen_server.cast(name, {:put, url, page})
end
def get(name, url) do
:gen_server.call(name, {:get, url})
end
def size(name) do
:gen_server.call(name, {:size})
end
end
Module CacheSupervisor:
defmodule CacheSupervisor do
use Supervisor
def init(_args) do
workers = Enum.map( [{"node1#DELL_XPS", :de},{"node1#DELL_XPS", :edu}, {"node2#DELL_XPS", :com} ,{"node2#DELL_XPS", :it}, {"node2#DELL_XPS", :rest}],
fn(n)-> worker(Cache, [n], id: elem(n, 1)) end)
supervise(workers, strategy: :one_for_one)
end
def start_link() do
:supervisor.start_link(__MODULE__, [])
end
end
:erlang.spawn(:node1#DELL_XPS, {:ok, #PID<0.128.0>})
:erlang.spawn/2 is the same function as Node.spawn/2. The function expects node name (which you have provided) and a function. Your GenServer.start_link call returned {:ok, Pid} as it should. Since a tuple can't be treated like a function Node.spawn/2 crashes.
I would not recommend spawning processes on separate nodes like this. If remote node goes down, not only will you lose that node in your cluster, but you will also have to deal with the fallout from all your spawned processes. This will result an app that is more brittle than it would otherwise be. If you want to have your cache GenServers running on multiple nodes I'd suggest running the application you are building on multiple nodes, and having an instance of your CacheSupervisor on each node. Then each CacheSupervisor starts up it's own GenServers underneath it. This is more robust because if a node goes down the remaining nodes will be unaffected. Of course you application logic will need to take this into account, losing a node could mean losing cache data or client connections. See this answer for more details: How does an Erlang gen_server start_link a gen_server on another node?
If you really really want to a spawn process on a remote node like this you could do this:
:erlang.spawn_link(:node1#DELL_XPS, fun() ->
{:ok, #PID<0.128.0>} = :gen_server.start_link({:local,elem(name, 1)}, __MODULE__, {HashDict.new, 0}, [])
receive
% Block forever
:exit -> :ok
end
end)
Note that you must use spawn_link, as supervisors expect to be linked to their children. If the supervisor is not linked it will not know when the child crashes and won't be able to restart the process.
In Erlang, can I call some function f (BIF or not), whose job is to spawn a process, run the function argf I provided, and doesn't "return" until argf has "returned", and do this without using receive clause (the reason for this is that f will be invoked in a gen_server, I don't want pollute the gen_server's mailbox).
A snippet would look like this:
%% some code omitted ...
F = fun() -> blah, blah, timer:sleep(10000) end,
f(F), %% like `spawn(F), but doesn't return until 10 seconds has passed`
%% ...
The only way to communicate between processes is message passing (of course you can consider to poll for a specific key in an ets or a file but I dont like this).
If you use a spawn_monitor function in f/1 to start the F process and then have a receive block only matching the possible system messages from this monitor:
f(F) ->
{_Pid, MonitorRef} = spawn_monitor(F),
receive
{_Tag, MonitorRef, _Type, _Object, _Info} -> ok
end.
you will not mess your gen_server mailbox. The example is the minimum code, you can add a timeout (fixed or parameter), execute some code on normal or error completion...
You will not "pollute" the gen_servers mailbox if you spawn+wait for message before you return from the call or cast. A more serious problem with this maybe that you will block the gen_server while you are waiting for the other process to terminate. A way around this is to not explicitly wait but return from the call/cast and then when the completion message arrives handle it in handle_info/2 and then do what is necessary.
If the spawning is done in a handle_call and you want to return the "result" of that process then you can delay returning the value to the original call from the handle_info handling the process termination message.
Note that however you do it a gen_server:call has a timeout value, either implicit or explicit, and if no reply is returned it generates an error in the calling process.
Main way to communicate with process in Erlang VM space is message passing with erlang:send/2 or erlang:send/3 functions (alias !). But you can "hack" Erlang and use multiple way for communicating over process.
You can use erlang:link/1 to communicate stat of the process, its mainly used in case of your process is dying or is ended or something is wrong (exception or throw).
You can use erlang:monitor/2, this is similar to erlang:link/1 except the message go directly into process mailbox.
You can also hack Erlang, and use some internal way (shared ETS/DETS/Mnesia tables) or use external methods (database or other things like that). This is clearly not recommended and "destroy" Erlang philosophy... But you can do it.
Its seems your problem can be solved with supervisor behavior. supervisor support many strategies to control supervised process:
one_for_one: If one child process terminates and is to be restarted, only that child process is affected. This is the default restart strategy.
one_for_all: If one child process terminates and is to be restarted, all other child processes are terminated and then all child processes are restarted.
rest_for_one: If one child process terminates and is to be restarted, the 'rest' of the child processes (that is, the child processes after the terminated child process in the start order) are terminated. Then the terminated child process and all child processes after it are restarted.
simple_one_for_one: A simplified one_for_one supervisor, where all child processes are dynamically added instances of the same process type, that is, running the same code.
You can also modify or create your own supervisor strategy from scratch or base on supervisor_bridge.
So, to summarize, you need a process who wait for one or more terminating process. This behavior is supported natively with OTP, but you can also create your own model. For doing that, you need to share status of every started process, using cache or database, or when your process is spawned. Something like that:
Fun = fun
MyFun (ParentProcess, {result, Data})
when is_pid(ParentProcess) ->
ParentProcess ! {self(), Data};
MyFun (ParentProcess, MyData)
when is_pid(ParentProcess) ->
% do something
MyFun(ParentProcess, MyData2) end.
spawn(fun() -> Fun(self(), InitData) end).
EDIT: forgot to add an example without send/receive. I use an ETS table to store every result from lambda function. This ETS table is set when we spawn this process. To get result, we can select data from this table. Note, the key of the row is the process id of the process.
spawner(Ets, Fun, Args)
when is_integer(Ets),
is_function(Fun) ->
spawn(fun() -> Fun(Ets, Args) end).
Fun = fun
F(Ets, {result, Data}) ->
ets:insert(Ets, {self(), Data});
F(Ets, Data) ->
% do something here
Data2 = Data,
F(Ets, Data2) end.
To exchange data,it becomes important to link the process first.The following code does the job of linking two processes.
start_link(Name) ->
gen_fsm:start_link(?MODULE, [Name], []).
My Question : which are the two processes being linked here?
In your example, the process that called start_link/1 and the process being started as (?MODULE, Name, Args).
It is a mistake to think that two processes need to be linked to exchange data. Data links the fate of the two processes. If one dies, the other dies, unless a system process is the one that starts the link (a "system process" means one that is trapping exits). This probably isn't what you want. If you are trying to avoid a deadlock or do something other than just timeout during synchronous messaging if the process you are sending a message to dies before responding, consider something like this:
ask(Proc, Request, Data, Timeout) ->
Ref = monitor(process, Proc),
Proc ! {self(), Ref, {ask, Request, Data}},
receive
{Ref, Res} ->
demonitor(Ref, [flush]),
Res;
{'DOWN', Ref, process, Proc, Reason} ->
some_cleanup_action(),
{fail, Reason}
after
Timeout ->
{fail, timeout}
end.
If you are just trying to spawn a worker that needs to give you an answer, you might want to consider using spawn_monitor instead and using its {pid(), reference()} return as the message you're listening for in response.
As I mentioned above, the process starting the link won't die if it is trapping exits, but you really want to avoid trapping exits in most cases. As a basic rule, use process_flag(trap_exit, true) as little as possible. Getting trap_exit happy everywhere will have structural effects you won't intend eventually, and its one of the few things in Erlang that is difficult to refactor away from later.
The link is bidirectional, between the process which is calling the function start_link(Name) and the new process created by gen_fsm:start_link(?MODULE, [Name], []).
A called function is executed in the context of the calling process.
A new process is created by a spawn function. You should find it in the gen_fsm:start_link/3 code.
When a link is created, if one process exit for an other reason than normal, the linked process will die also, except if it has set process_flag(trap_exit, true) in which case it will receive the message {'EXIT',FromPid,Reason} where FromPid is the Pid of the process that came to die, and Reason the reason of termination.
It seems that an erlang process will stay alive until the 5 sec default timeout even if it has finished it's work.
I have gen_server call that issues a command to the window CLI which can be completed in less than 1 sec but the process waits 5 sec before I see the result of the operation. What's going on? is it soemthing to do with the timeout, or might it be something else.
EDIT This call doesn't do anything for 5 seconds (the default timeout!)
handle_call({create_app, Path, Name, Args}, _From, State) ->
case filelib:ensure_dir(Path) of
{error, Reason} ->
{reply, Reason, State};
_ ->
file:set_cwd(Path),
Response = os:cmd(string:join(["Rails", Name, Args], " ")),
{reply, Response, State}
end;
I'm guessing the os:cmd is taking that long to return the results. It's possible that maybe the os:cmd is having trouble telling when the rails command is completed and doesn't return till the process triggers the timeout. But from your code I'd say the most likely culprit is the os:cmd call.
Does the return contain everything you expect it to?
You still have not added any information on what the problem is. But I see some other things I'd like to comment on.
Current working dir
You are using file:set_cwd(Path) so the started command will inherit that path. The cwd of the file server is global. You should probably not be using it at all in application code. It's useful for setting the cwd to where you want erlang crash dumps to be written etc.
Your desire to let rail execute with the cwd according to Path is better served with something like this:
_ ->
Response = os:cmd(string:join(["cd", Path, "&&", "Rails", Name, Args], " ")),
{reply, Response, State}
That is, start a shell to parse the command line, have the shell change cwd and the start Rails.
Blocking a gen_server
The gen_server is there to serialize processing. That is, it processes one message after the other. It doesn't handle them all concurrently. It is its reason for existence to not handle them concurrently.
You are (in relation to other costs) doing some very costly computation in the gen_server: starting an external process that runs this rails application. Is it your intention to have at most one rails application running at any one time? (I've heard about ruby on rails requiring tons of memory per process, so it might be a sensible decision).
If you dont need to update the State with any values from a costly call, as in your example code, then you can use an explicit gen_server:reply/2 call.
_ ->
spawn_link(fun () -> rails_cmd(From, Path, Name, Args) end),
{no_reply, State}
And then you have
rails_cmd(From, Path, Name, Args) ->
Response = os:cmd(string:join(["cd", Path, "&&", "Rails", Name, Args], " ")),
gen_server:reply(From, Response).