how to create a keep-alive process in Erlang - erlang

I'm currently reading Programming Erlang! , at the end of Chapter 13, we want to create a keep-alive process,
the example likes:
on_exit(Pid, Fun) ->
spawn(fun() ->
Ref = monitor(process, Pid),
receive
{'DOWN', Ref, process, Pid, Info} ->
Fun(Info)
end
end).
keep_alive(Name, Fun) ->
register(Name, Pid = spawn(Fun)),
on_exit(Pid, fun(_Why) -> keep_alive(Name, Fun) end).
but when between register/2 and on_exit/2 the process maybe exit, so the monitor will failed, I changed the keep_alive/2 like this:
keep_alive(Name, Fun) ->
{Pid, Ref} = spawn_monitor(Fun),
register(Name, Pid),
receive
{'DOWN', Ref, process, Pid, _Info} ->
keep_alive(Name, Fun)
end.
There also an bug, between spawn_monitor/2 and register/2, the process maybe exit. How could this come to run successfully? Thanks.

I'm not sure that you have a problem that needs solving. Monitor/2 will succeed even if your process exits after register/2. Monitor/2 will send a 'DOWN' message whose Info component will be noproc. Per the documentation:
A 'DOWN' message will be sent to the monitoring process if Item dies, if Item does not exist, or if the connection is lost to the node which Item resides on. (see http://www.erlang.org/doc/man/erlang.html#monitor-2).
So, in your original code
register assocates Name to the Pid
Pid dies
on_exit is called and monitor/2 is executed
monitor immediately sends a 'DOWN' message which is received by the function spawned by on_exit
the Fun(Info) of the received statement is executed calling keep_alive/2
I think all is good.

So why you did't want to use erlang supervisor behaviour? it's provides useful functions for creating and restarting keep-alive processes.
See here the example: http://www.erlang.org/doc/design_principles/sup_princ.html

In your second example, if process exits before registration register will fail with badarg. The easiest way to get around that would be surrounding register with try ... catch and handle error in catch.
You can even leave catch empty, because even if registration failed, the 'DOWN' message, will be sent.
On the other hand, I wouldn't do that in production system. If your worker fails so fast, it is very likely, that the problem is in its initialisation code and I would like to know, that it failed to register and stopped the system. Otherwise, it could fail and be respawned in an endless loop.

Related

correct usage of erlang spawn_monitor

Still working through Joe's book, and having hard time fully understanding monitors in general and spawn_monitor in particular. Here's the code I have; the exercise is asking to write a function that will start a process whose job is to print a heartbeat every 5 seconds, and then a function to monitor the above process and restart it. I didn't get to a restart part, because my monitor fails to even detect the process keeling over.
% simple "working" loop
loop_5_print() ->
receive
after 5000 ->
io:format("I'm still alive~n"),
loop_5_print()
end.
% function to spawn and register a named worker
create_reg_keep_alive(Name) when not is_atom(Name) ->
{error, badargs};
create_reg_keep_alive(Name) ->
Pid = spawn(ex, loop_5_print, []),
register(Name, Pid),
{Pid, Name}.
% a simple monitor loop
monitor_loop(AName) ->
Pid = whereis(AName),
io:format("monitoring PID ~p~n", [Pid]),
receive
{'DOWN', _Ref, process, Pid, Why} ->
io:format("~p died because ~p~n",[AName, Why]),
% add the restart logic
monitor_loop(AName)
end.
% function to bootstrapma monitor
my_monitor(AName) ->
case whereis(AName) of
undefined -> {error, no_such_registration};
_Pid -> spawn_monitor(ex, monitor_loop, [AName])
end.
And here's me playing with in:
39> c("ex.erl").
{ok,ex}
40> ex:create_reg_keep_alive(myjob).
{<0.147.0>,myjob}
I'm still alive
I'm still alive
41> ex:my_monitor(myjob).
monitoring PID <0.147.0>
{<0.149.0>,#Ref<0.230612052.2032402433.56637>}
I'm still alive
I'm still alive
42> exit(whereis(myjob), stop).
true
43>
It sure stopped the loop_5_print "worker" - but where's the line that the monitor was supposed to print? The only explanation that I see is that the message emitted by a process quitting in this manner isn't of the pattern on which I am matching inside monitor loop's receive. But that's the only pattern introduced in the book in this chapter, so I'm not buying this explanation..
spawn_monitor is not what you want here. spawn_monitor spawns a process and immediately starts monitoring it. When the spawned process dies, the process that called spawn_monitor gets a message that the process is dead. You need to call erlang:monitor/2 from the process that you want to receive the DOWN messages in, with the second argument being the Pid to monitor.
Just add:
monitor(process, Pid),
after:
Pid = whereis(AName),
and it works:
1> c(ex).
{ok,ex}
2> ex:create_reg_keep_alive(myjob).
{<0.67.0>,myjob}
I'm still alive
I'm still alive
I'm still alive
3> ex:my_monitor(myjob).
monitoring PID <0.67.0>
{<0.69.0>,#Ref<0.2696002348.2586050567.188678>}
I'm still alive
I'm still alive
I'm still alive
4> exit(whereis(myjob), stop).
myjob died because stop
true
monitoring PID undefined

Erlang: spawn a process and wait for termination without using `receive`

In Erlang, can I call some function f (BIF or not), whose job is to spawn a process, run the function argf I provided, and doesn't "return" until argf has "returned", and do this without using receive clause (the reason for this is that f will be invoked in a gen_server, I don't want pollute the gen_server's mailbox).
A snippet would look like this:
%% some code omitted ...
F = fun() -> blah, blah, timer:sleep(10000) end,
f(F), %% like `spawn(F), but doesn't return until 10 seconds has passed`
%% ...
The only way to communicate between processes is message passing (of course you can consider to poll for a specific key in an ets or a file but I dont like this).
If you use a spawn_monitor function in f/1 to start the F process and then have a receive block only matching the possible system messages from this monitor:
f(F) ->
{_Pid, MonitorRef} = spawn_monitor(F),
receive
{_Tag, MonitorRef, _Type, _Object, _Info} -> ok
end.
you will not mess your gen_server mailbox. The example is the minimum code, you can add a timeout (fixed or parameter), execute some code on normal or error completion...
You will not "pollute" the gen_servers mailbox if you spawn+wait for message before you return from the call or cast. A more serious problem with this maybe that you will block the gen_server while you are waiting for the other process to terminate. A way around this is to not explicitly wait but return from the call/cast and then when the completion message arrives handle it in handle_info/2 and then do what is necessary.
If the spawning is done in a handle_call and you want to return the "result" of that process then you can delay returning the value to the original call from the handle_info handling the process termination message.
Note that however you do it a gen_server:call has a timeout value, either implicit or explicit, and if no reply is returned it generates an error in the calling process.
Main way to communicate with process in Erlang VM space is message passing with erlang:send/2 or erlang:send/3 functions (alias !). But you can "hack" Erlang and use multiple way for communicating over process.
You can use erlang:link/1 to communicate stat of the process, its mainly used in case of your process is dying or is ended or something is wrong (exception or throw).
You can use erlang:monitor/2, this is similar to erlang:link/1 except the message go directly into process mailbox.
You can also hack Erlang, and use some internal way (shared ETS/DETS/Mnesia tables) or use external methods (database or other things like that). This is clearly not recommended and "destroy" Erlang philosophy... But you can do it.
Its seems your problem can be solved with supervisor behavior. supervisor support many strategies to control supervised process:
one_for_one: If one child process terminates and is to be restarted, only that child process is affected. This is the default restart strategy.
one_for_all: If one child process terminates and is to be restarted, all other child processes are terminated and then all child processes are restarted.
rest_for_one: If one child process terminates and is to be restarted, the 'rest' of the child processes (that is, the child processes after the terminated child process in the start order) are terminated. Then the terminated child process and all child processes after it are restarted.
simple_one_for_one: A simplified one_for_one supervisor, where all child processes are dynamically added instances of the same process type, that is, running the same code.
You can also modify or create your own supervisor strategy from scratch or base on supervisor_bridge.
So, to summarize, you need a process who wait for one or more terminating process. This behavior is supported natively with OTP, but you can also create your own model. For doing that, you need to share status of every started process, using cache or database, or when your process is spawned. Something like that:
Fun = fun
MyFun (ParentProcess, {result, Data})
when is_pid(ParentProcess) ->
ParentProcess ! {self(), Data};
MyFun (ParentProcess, MyData)
when is_pid(ParentProcess) ->
% do something
MyFun(ParentProcess, MyData2) end.
spawn(fun() -> Fun(self(), InitData) end).
EDIT: forgot to add an example without send/receive. I use an ETS table to store every result from lambda function. This ETS table is set when we spawn this process. To get result, we can select data from this table. Note, the key of the row is the process id of the process.
spawner(Ets, Fun, Args)
when is_integer(Ets),
is_function(Fun) ->
spawn(fun() -> Fun(Ets, Args) end).
Fun = fun
F(Ets, {result, Data}) ->
ets:insert(Ets, {self(), Data});
F(Ets, Data) ->
% do something here
Data2 = Data,
F(Ets, Data2) end.

spawn_monitor registered name in different module

I'm trying to monitor a process with a registered name in a different module than where the monitor code is placed. This is an assignment for school, which is why I'm not going to post all of my code. However, here's the outline:
module1:start() spawns a process and registers its name:
register(name, Pid = spawn(?MODULE, loop, [])), Pid.
The loop waits for messages. If the message is of the wrong type it crashes.
module2:start() should start the registered process in module1 and monitor it, restarting it if it's crashed. I've been able to get it working using:
spawn(?MODULE, loop, [module1:start()]).
Then in the loop function I use erlang:monitor(process, Pid).
This way of solving the problem means the registered process can crash before the monitoring starts. I've been looking at spawn_monitor, but haven't been able to get the monitoring to work. The latest I've tried is:
spawn(?MODULE, loop, [spawn_monitor(name, start, [])]).
It starts the registered process. I can send messages to it, but I can't seem to detect anything. In the loop function I have a receive block, where I try to pattern match {'DOWN', Ref, process, Pid, _Why}. I've tried using spawn_monitor in module1 instead of simply spawn, but I noticed no change. I've also been trying to solve this using links (as in spawn_link), but I haven't gotten that to work either.
Any suggestions? What am I monitoring, if I'm not monitoring the registered process?
Since this is a homework assignment, I won't give you a complete answer.
Generally, you need two loops, one in module1 to do the work, and one in module2 to supervise the work. You already have a module1:start/0 function that calls spawn to execute the module1:loop/0 function to do the work, but as you've stated, this leaves a window of vulnerability between the spawning of the process and its monitoring by module2 that you're trying to close. As a hint, you could change the start function to call spawn_monitor instead:
start() ->
{Pid, Ref} = spawn_monitor(?MODULE, loop, []),
register(name, Pid),
{Pid, Ref}.
and then your module2:start/0 function would then just call it like this:
start() ->
{Pid, Ref} = module1:start(),
receive
{'DOWN', Ref, process, Pid, _Why} ->
%% restart the module1 pid
%% details left out intentionally
end.
Note that this implies that module2:start/0 needs a loop of some sort to spawn and monitor the module1 pid, and restart it when necessary. I leave that to your homework efforts.
Also, using spawn_link instead of spawn_monitor is definitely worth exploring.

Link two process in Erlang?

To exchange data,it becomes important to link the process first.The following code does the job of linking two processes.
start_link(Name) ->
gen_fsm:start_link(?MODULE, [Name], []).
My Question : which are the two processes being linked here?
In your example, the process that called start_link/1 and the process being started as (?MODULE, Name, Args).
It is a mistake to think that two processes need to be linked to exchange data. Data links the fate of the two processes. If one dies, the other dies, unless a system process is the one that starts the link (a "system process" means one that is trapping exits). This probably isn't what you want. If you are trying to avoid a deadlock or do something other than just timeout during synchronous messaging if the process you are sending a message to dies before responding, consider something like this:
ask(Proc, Request, Data, Timeout) ->
Ref = monitor(process, Proc),
Proc ! {self(), Ref, {ask, Request, Data}},
receive
{Ref, Res} ->
demonitor(Ref, [flush]),
Res;
{'DOWN', Ref, process, Proc, Reason} ->
some_cleanup_action(),
{fail, Reason}
after
Timeout ->
{fail, timeout}
end.
If you are just trying to spawn a worker that needs to give you an answer, you might want to consider using spawn_monitor instead and using its {pid(), reference()} return as the message you're listening for in response.
As I mentioned above, the process starting the link won't die if it is trapping exits, but you really want to avoid trapping exits in most cases. As a basic rule, use process_flag(trap_exit, true) as little as possible. Getting trap_exit happy everywhere will have structural effects you won't intend eventually, and its one of the few things in Erlang that is difficult to refactor away from later.
The link is bidirectional, between the process which is calling the function start_link(Name) and the new process created by gen_fsm:start_link(?MODULE, [Name], []).
A called function is executed in the context of the calling process.
A new process is created by a spawn function. You should find it in the gen_fsm:start_link/3 code.
When a link is created, if one process exit for an other reason than normal, the linked process will die also, except if it has set process_flag(trap_exit, true) in which case it will receive the message {'EXIT',FromPid,Reason} where FromPid is the Pid of the process that came to die, and Reason the reason of termination.

Erlang. Correct way to stop process

Good day, i have following setup for my little service:
-module(mrtask_net).
-export([start/0, stop/0, listen/1]).
-define(SERVER, mrtask_net).
start() ->
Pid = spawn_link(fun() -> ?MODULE:listen(4488) end),
register(?SERVER, Pid),
Pid.
stop() ->
exit(?SERVER, ok).
....
And here is the repl excerpt:
(emacs#rover)83> mrtask_net:start().
<0.445.0>
(emacs#rover)84> mrtask_net:stop().
** exception error: bad argument
in function exit/2
called as exit(mrtask_net,ok)
in call from mrtask_net:stop/0
(emacs#rover)85>
As you see, stopping process produces error, process is stopping though.
What does this error mean and how to make thing clean ?
Not being an Erlang programmer and just from the documentation of exit (here), I'd say, that exit requires a process id as first argument whereas you are passing an atom (?SERVER) to it.
Try
exit(whereis(?SERVER), ok).
instead (whereis returns the process id associated with a name, see here)
You need to change the call to exit/2 as #MartinStettner has pointed out. The reason the process stops anyway is that you have started it with spawn_link. Your process is then linked to the shell process. When you called mrtask_net:stop() the error caused the shell process to crash which then caused your process to crash as they were linked. A new shell process is then automatically started so you can keep working with the shell. You generally do want to start your servers with spawn_link but it can cause confusion when your are testing them from the shell and they just "happen" to die.
I would suggest you to stick with OTP. It really gives you tons of advantages (I hardly can immagine the case where OTP doesn't benefit).
So, if you want to stop process in OTP you should do something like this for gen_server:
% process1.erl
% In case you get cast message {stopme, Message}
handle_cast({stopme, Message}, State) ->
% you will stop
{stop, normal, State}
handle_cast(Msg, State) ->
% do your stuff here with msg
{noreply, State}.
% process2.erl
% Here the code to stop process1
gen_server:cast(Pid, {stopme, "It's time to stop!"}),
More about it you can find here: http://www.erlang.org/doc/man/gen_server.html

Resources