Still working through Joe's book, and having hard time fully understanding monitors in general and spawn_monitor in particular. Here's the code I have; the exercise is asking to write a function that will start a process whose job is to print a heartbeat every 5 seconds, and then a function to monitor the above process and restart it. I didn't get to a restart part, because my monitor fails to even detect the process keeling over.
% simple "working" loop
loop_5_print() ->
receive
after 5000 ->
io:format("I'm still alive~n"),
loop_5_print()
end.
% function to spawn and register a named worker
create_reg_keep_alive(Name) when not is_atom(Name) ->
{error, badargs};
create_reg_keep_alive(Name) ->
Pid = spawn(ex, loop_5_print, []),
register(Name, Pid),
{Pid, Name}.
% a simple monitor loop
monitor_loop(AName) ->
Pid = whereis(AName),
io:format("monitoring PID ~p~n", [Pid]),
receive
{'DOWN', _Ref, process, Pid, Why} ->
io:format("~p died because ~p~n",[AName, Why]),
% add the restart logic
monitor_loop(AName)
end.
% function to bootstrapma monitor
my_monitor(AName) ->
case whereis(AName) of
undefined -> {error, no_such_registration};
_Pid -> spawn_monitor(ex, monitor_loop, [AName])
end.
And here's me playing with in:
39> c("ex.erl").
{ok,ex}
40> ex:create_reg_keep_alive(myjob).
{<0.147.0>,myjob}
I'm still alive
I'm still alive
41> ex:my_monitor(myjob).
monitoring PID <0.147.0>
{<0.149.0>,#Ref<0.230612052.2032402433.56637>}
I'm still alive
I'm still alive
42> exit(whereis(myjob), stop).
true
43>
It sure stopped the loop_5_print "worker" - but where's the line that the monitor was supposed to print? The only explanation that I see is that the message emitted by a process quitting in this manner isn't of the pattern on which I am matching inside monitor loop's receive. But that's the only pattern introduced in the book in this chapter, so I'm not buying this explanation..
spawn_monitor is not what you want here. spawn_monitor spawns a process and immediately starts monitoring it. When the spawned process dies, the process that called spawn_monitor gets a message that the process is dead. You need to call erlang:monitor/2 from the process that you want to receive the DOWN messages in, with the second argument being the Pid to monitor.
Just add:
monitor(process, Pid),
after:
Pid = whereis(AName),
and it works:
1> c(ex).
{ok,ex}
2> ex:create_reg_keep_alive(myjob).
{<0.67.0>,myjob}
I'm still alive
I'm still alive
I'm still alive
3> ex:my_monitor(myjob).
monitoring PID <0.67.0>
{<0.69.0>,#Ref<0.2696002348.2586050567.188678>}
I'm still alive
I'm still alive
I'm still alive
4> exit(whereis(myjob), stop).
myjob died because stop
true
monitoring PID undefined
Related
Module test:
tester() ->
receive
X ->
erlang:display("message.."),
tester()
end.
initialize() ->
spawn_link(?MODULE, tester, []),
erlang:display("Started successfully.").
REPL:
length(erlang:processes()). -> 23
Pid = spawn_link(test, initialize, []).
length(erlang:processes()). -> 24
exit(Pid).
length(erlang:processes()). -> 24
It seems that the spawned tester process is still running! How do i make sure that when i exit my application, all spawn_link process are killed too?
Well, you are actually starting two Erlang processes, not one. The first one, to which you are sending the exit signal, dies before you even send the exit signal, so the exit has no effect.
The first process you start in the shell in this line:
Pid = spawn_link(test, initialize, []).
This process starts executing the initialize function, in which it starts the second process, and then it dies because there is nothing else to do. This is the process to which you are trying to send the exit signal.
To fix this simply return the correct Pid from the initialize function:
initialize() ->
Pid = spawn_link(?MODULE, tester, []),
erlang:display("Started successfully."),
Pid.
And start it directly:
Pid2 = test:initialize().
Then you will be able to kill it with exit(Pid2).
I'm trying to monitor a process with a registered name in a different module than where the monitor code is placed. This is an assignment for school, which is why I'm not going to post all of my code. However, here's the outline:
module1:start() spawns a process and registers its name:
register(name, Pid = spawn(?MODULE, loop, [])), Pid.
The loop waits for messages. If the message is of the wrong type it crashes.
module2:start() should start the registered process in module1 and monitor it, restarting it if it's crashed. I've been able to get it working using:
spawn(?MODULE, loop, [module1:start()]).
Then in the loop function I use erlang:monitor(process, Pid).
This way of solving the problem means the registered process can crash before the monitoring starts. I've been looking at spawn_monitor, but haven't been able to get the monitoring to work. The latest I've tried is:
spawn(?MODULE, loop, [spawn_monitor(name, start, [])]).
It starts the registered process. I can send messages to it, but I can't seem to detect anything. In the loop function I have a receive block, where I try to pattern match {'DOWN', Ref, process, Pid, _Why}. I've tried using spawn_monitor in module1 instead of simply spawn, but I noticed no change. I've also been trying to solve this using links (as in spawn_link), but I haven't gotten that to work either.
Any suggestions? What am I monitoring, if I'm not monitoring the registered process?
Since this is a homework assignment, I won't give you a complete answer.
Generally, you need two loops, one in module1 to do the work, and one in module2 to supervise the work. You already have a module1:start/0 function that calls spawn to execute the module1:loop/0 function to do the work, but as you've stated, this leaves a window of vulnerability between the spawning of the process and its monitoring by module2 that you're trying to close. As a hint, you could change the start function to call spawn_monitor instead:
start() ->
{Pid, Ref} = spawn_monitor(?MODULE, loop, []),
register(name, Pid),
{Pid, Ref}.
and then your module2:start/0 function would then just call it like this:
start() ->
{Pid, Ref} = module1:start(),
receive
{'DOWN', Ref, process, Pid, _Why} ->
%% restart the module1 pid
%% details left out intentionally
end.
Note that this implies that module2:start/0 needs a loop of some sort to spawn and monitor the module1 pid, and restart it when necessary. I leave that to your homework efforts.
Also, using spawn_link instead of spawn_monitor is definitely worth exploring.
I'm currently reading Programming Erlang! , at the end of Chapter 13, we want to create a keep-alive process,
the example likes:
on_exit(Pid, Fun) ->
spawn(fun() ->
Ref = monitor(process, Pid),
receive
{'DOWN', Ref, process, Pid, Info} ->
Fun(Info)
end
end).
keep_alive(Name, Fun) ->
register(Name, Pid = spawn(Fun)),
on_exit(Pid, fun(_Why) -> keep_alive(Name, Fun) end).
but when between register/2 and on_exit/2 the process maybe exit, so the monitor will failed, I changed the keep_alive/2 like this:
keep_alive(Name, Fun) ->
{Pid, Ref} = spawn_monitor(Fun),
register(Name, Pid),
receive
{'DOWN', Ref, process, Pid, _Info} ->
keep_alive(Name, Fun)
end.
There also an bug, between spawn_monitor/2 and register/2, the process maybe exit. How could this come to run successfully? Thanks.
I'm not sure that you have a problem that needs solving. Monitor/2 will succeed even if your process exits after register/2. Monitor/2 will send a 'DOWN' message whose Info component will be noproc. Per the documentation:
A 'DOWN' message will be sent to the monitoring process if Item dies, if Item does not exist, or if the connection is lost to the node which Item resides on. (see http://www.erlang.org/doc/man/erlang.html#monitor-2).
So, in your original code
register assocates Name to the Pid
Pid dies
on_exit is called and monitor/2 is executed
monitor immediately sends a 'DOWN' message which is received by the function spawned by on_exit
the Fun(Info) of the received statement is executed calling keep_alive/2
I think all is good.
So why you did't want to use erlang supervisor behaviour? it's provides useful functions for creating and restarting keep-alive processes.
See here the example: http://www.erlang.org/doc/design_principles/sup_princ.html
In your second example, if process exits before registration register will fail with badarg. The easiest way to get around that would be surrounding register with try ... catch and handle error in catch.
You can even leave catch empty, because even if registration failed, the 'DOWN' message, will be sent.
On the other hand, I wouldn't do that in production system. If your worker fails so fast, it is very likely, that the problem is in its initialisation code and I would like to know, that it failed to register and stopped the system. Otherwise, it could fail and be respawned in an endless loop.
When running this code in the Erlang console
Pid = spawn(fun() -> "foo" end),link(Pid),receive X -> X end.
I receive the following error.
** exception error: no such process or port
in function link/1
called as link(<0.71.0>)```
This happens because the process you spawn finishes very quickly: it only "returns" a string (and the return value goes nowhere, since it is the top-level function in the call stack of the new process), so it's very likely to finish before the emulator gets to the link call.
You can make it more likely to succeed by making the process sleep before exiting:
2> Pid = spawn(fun() -> timer:sleep(1000), "foo" end),link(Pid).
true
Note however that the receive expression in your example most likely won't receive anything, since the spawned process doesn't send any message, and the link won't generate any message either since the process exits normally, and the calling process most likely isn't trapping exits. You may want to do something like:
Parent = self(),
spawn(fun() -> Parent ! "foo" end),
receive X -> X end.
That returns "foo".
I created a supervisor that spawned a gen_server I called timer_server. One of the tasks of this timer_server is to manage registration and call timer:send_interval to send a message to a pid on a certain interval.
However, in the init of the gen_server, where I call timer:send_interval I was getting a lockup. The documentation said the timer: functions return immediately, so this was very troubling.
When I renamed my gen_server to record_timer_server this problem cleared up. My question is two fold then:
Why could I create a registered process timer_server, if there already was one when timer:start() was called by my application starting up?
Once started, why would this function not cause a badmatch finding the name, if it was calling in to my timer_server using the send_interval function?
I don't think code is necessary but I can update to add some if requested.
This can be recreated simply by doing the following which hangs on the call to timer:send_interval.
1> register(timer_server, self()).
true
2> timer:send_interval(5000, self(), hello).
While this fails...
1> timer:send_interval(5000, self(), hello).
{ok,{interval,#Ref<0.0.0.32>}}
2> register(timer_server, self()).
** exited: {badarg,[{erlang,register,[timer_server,<0.30.0>]},
So, it seems that the first call to timer tries to start a process called timer_server, and hangs if you've taken this name first.
As to why it hangs timer.erl does:
ensure_started() ->
case whereis(timer_server) of
undefined ->
C = {timer_server, {?MODULE, start_link, []}, permanent, 1000,
worker, [?MODULE]}
supervisor:start_child(kernel_safe_sup, C), % kernel_safe_sup
ok;
_ -> ok
end.
which returns fine, followed by a gen_server:call to timer_server. Your process then gets stuck waiting for itself to respond.