Erlang supervisor shutdowns after running child - erlang

I have a test module and a one_for_one supervisor.
test.erl
-module(test).
-export([do_job/1,run/2, start_worker/1]).
run(Id, Fun) ->
test_sup:start_child(Id, [Fun]).
do_job(Fun) ->
Fun().
start_worker(Args) ->
Pid = spawn_link(test, do_job, Args),
io:format("started ~p~n",[Pid]),
{ok, Pid}.
test_sup.erl
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
-export([start_child/2]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init(_Args) ->
SupFlags = #{strategy => one_for_one, intensity => 2, period => 20},
{ok, {SupFlags, []}}.
start_child(Id, Args) ->
ChildSpecs = #{id => Id,
start => {test, start_worker, [Args]},
restart => transient,
shutdown => brutal_kill,
type => worker,
modules => [test]},
supervisor:start_child(?MODULE, ChildSpecs).
Now i start supervisor in shell and run command test:run(id, fun() -> erlang:throw(err) end). It works nice and the function start_worker/1 restart three times but after that, an exception occurs and the supervisor process shutdowns and i must start it manually with command test_sup:start_link(). What is the problem?
Shell:
1> test_sup:start_link().
{ok,<0.36.0>}
2> test:run(id, fun() -> erlang:throw(err) end).
started <0.38.0>
started <0.39.0>
started <0.40.0>
{ok,<0.38.0>}
=ERROR REPORT==== 16-Dec-2016::23:31:50 ===
Error in process <0.38.0> with exit value:
{{nocatch,err},[{test,do_job,1,[]}]}
=ERROR REPORT==== 16-Dec-2016::23:31:50 ===
Error in process <0.39.0> with exit value:
{{nocatch,err},[{test,do_job,1,[]}]}
=ERROR REPORT==== 16-Dec-2016::23:31:50 ===
Error in process <0.40.0> with exit value:
{{nocatch,err},[{test,do_job,1,[]}]}
** exception exit: shutdown

What is the problem?
There is no "problem". It's working exactly as you told it to:
To prevent a supervisor from getting into an infinite loop of child process terminations and restarts, a maximum restart intensity is defined using two integer values specified with keys intensity and period in the above map. Assuming the values MaxR for intensity and MaxT for period, then, if more than MaxR restarts occur within MaxT seconds, the supervisor terminates all child processes and then itself.
Your supervisor's configuration says, "If I have to restart a child more than two times (intensity) in 20 seconds (period), then something is wrong, so just shut down." As to why you have to restart the supervisor manually, it's because your supervisor isn't supervised itself. Otherwise, the supervisor's supervisor might try to restart it based on its own configuration.

Related

Erlang beginnings: moving a function from an escript into OTP

There is a simple implementation of the factorial function in an 'escript' in the Erlang docs. The factorial function is given as:
fac(0) -> 1;
fac(N) -> N * fac(N-1).
That's all fine, I can get this to work, no problem.
I would however like to know how I can implement this same, simple factorial function in an 'OTP way' using rebar3?
Just to be clear, my questions are:
Where does the code go?
How would I call it from the shell?
Could I also run it from the command line like I do via the escript example?
FYI, I have gotten started with rebar3. Here is where I am at:
rebar3 new app factorial
creates a few files but specifically the code is in 3 files in a src directory. I can see that a supervisor is being used, seems fine.
I can interact with this project from the shell:
$ rebar3 shell
1> application:which_applications().
[{factorial,"An OTP application","0.1.0"},
{inets,"INETS CXC 138 49","7.0.3"},
{ssl,"Erlang/OTP SSL application","9.1.1"},
{public_key,"Public key infrastructure","1.6.4"},
{asn1,"The Erlang ASN1 compiler version 5.0.8","5.0.8"},
{crypto,"CRYPTO","4.4"},
{stdlib,"ERTS CXC 138 10","3.7"},
{kernel,"ERTS CXC 138 10","6.2"}]
2> application:stop(factorial).
=INFO REPORT==== 21-Jan-2019::12:42:07.484244 ===
application: factorial
exited: stopped
type: temporary
ok
3> application:start(factorial).
ok
Where does the code go?
To 'call code in the OTP way', you can put it behind a gen_server.
For this simple factorial function, I added a new file factorial.erl within the src directory which is pretty much a standard gen_server skeleton with my factorial function as one of the callbacks:
% factorial.erl
-module(factorial).
-behaviour(gen_server).
-export([start_link/0, stop/0, calc/1]).
<boilerplate gen_server stuff here, like init, etc.>
calc(N) ->
{ok, Result} = gen_server:call(?SERVER, {calc, N}),
{ok, Result}.
handle_call({calc, N}, _From, State) ->
Factorial = factorial(N),
Reply = {ok, Factorial},
{reply, Reply, State};
factorial(0) ->
1;
factorial(N) ->
N * factorial(N-1).
Since my rebar3 new app factorial created a supervisor, I modified the supervisor's init so that it calls my factorial module:
% factorial_sup.erl
<skeleton supervisor stuff here>
init([]) ->
Server = {factorial, {factorial, start_link, []},
permanent, 2000, worker, [factorial]},
Children = [Server],
RestartStrategy = {one_for_one, 0, 1},
{ok, {RestartStrategy, Children}}.
How do I call it from the shell?
$ rebar3 shell
<Enter>
1> factorial:calc(5).
{ok,120}
Since this is running under a supervisor, we can still stop and restart it:
2> application:stop(factorial).
=INFO REPORT==== 22-Jan-2019::13:31:29.243520 ===
application: factorial
exited: stopped
type: temporary
ok
3> factorial:calc(5).
** exception exit: {noproc,{gen_server,call,[factorial,{calc,5}]}}
in function gen_server:call/2 (gen_server.erl, line 215)
in call from factorial:calc/1 (/Users/robert/git/factorial/src/factorial.erl, line 32)
4> application:start(factorial).
ok
5> factorial:calc(5).
{ok,120}
How do I create an executable?
Work in progress :-).

Can't start process in erlang node

I have two erlang nodes, node01 is 'vm01#192.168.146.128', node02 is 'vm02#192.168.146.128'. I want to start one process on node01 by using spawn(Node, Mod, Fun, Args) on node02, but I always get useless pid.
Node connection is ok:
(vm02#192.168.146.128)14> net_adm:ping('vm01#192.168.146.128').
pong
Module is in the path of node01 and node02:
(vm01#192.168.146.128)7> m(remote_process).
Module: remote_process
MD5: 99784aa56b4feb2f5feed49314940e50
Compiled: No compile time info available
Object file: /src/remote_process.beam
Compiler options: []
Exports:
init/1
module_info/0
module_info/1
start/0
ok
(vm02#192.168.146.128)20> m(remote_process).
Module: remote_process
MD5: 99784aa56b4feb2f5feed49314940e50
Compiled: No compile time info available
Object file: /src/remote_process.beam
Compiler options: []
Exports:
init/1
module_info/0
module_info/1
start/0
ok
However, the spawn is not successful:
(vm02#192.168.146.128)21> spawn('vm01#192.168.146.128', remote_process, start, []).
I'm on node 'vm01#192.168.146.128'
<9981.89.0>
My pid is <9981.90.0>
(vm01#192.168.146.128)8> whereis(remote_process).
undefined
The process is able to run on local node:
(vm02#192.168.146.128)18> remote_process:start().
I'm on node 'vm02#192.168.146.128'
My pid is <0.108.0>
{ok,<0.108.0>}
(vm02#192.168.146.128)24> whereis(remote_process).
<0.115.0>
But it fails on remote node. Can anyone give me some idea?
Here is the source code remote_process.erl:
-module(remote_process).
-behaviour(supervisor).
-export([start/0, init/1]).
start() ->
{ok, Pid} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
{ok, Pid}.
init([]) ->
io:format("I'm on node ~p~n", [node()]),
io:format("My pid is ~p~n", [self()]),
{ok, {{one_for_one, 1, 5}, []}}.
You are using a global registration for your process, it is necessary for your purpose. The function to retrieve it is global:whereis_name(remote_process).
Edit : It works if
the 2 nodes are connected (check with nodes())
the process is registered with the global module
the process is still alive
if any of these conditions is not satisfied you will get undefined
Edit 2: start node 1 with : werl -sname p1 and type in the shell :
(p1#W7FRR00423L)1> c(remote_process).
{ok,remote_process}
(p1#W7FRR00423L)2> remote_process:start().
I'm on node p1#W7FRR00423L
My pid is <0.69.0>
{ok,<0.69.0>}
(p1#W7FRR00423L)3> global:whereis_name(remote_process).
<0.69.0>
(p1#W7FRR00423L)4>
then start a second node with werl - sname p2 and type in the shell (it is ok to connect the second node later, the global registration is "updated" when necessary):
(p2#W7FRR00423L)1> net_kernel:connect_node(p1#W7FRR00423L).
true
(p2#W7FRR00423L)2> nodes().
[p1#W7FRR00423L]
(p2#W7FRR00423L)3> global:whereis_name(remote_process).
<7080.69.0>
(p2#W7FRR00423L)4>
(p2#W7FRR00423L)4>
Edit 3:
In your test you are spawning a process P1 on the remote node which executes the function remote_process:start/0.
This function calls supervisor:start_link/3 which basically spawns a new supervisor process P2 and links itself to it. after this, P1 has nothing to do anymore so it dies, causing the linked process P2 to die too and you get an undefined reply to the global:whereis_name call.
In my test, I start the process from the shell of the remote node; the shell does not die after I evaluate remote_process:start/0, so the supervisor process does not die and global:whereis_name find the requested pid.
If you want that the supervisor survive to the call, you need an intermediate process that will be spawned without link, so it will not die with its parent. I give you a small example based on your code:
-module(remote_process).
-behaviour(supervisor).
-export([start/0, init/1,local_spawn/0,remote_start/1]).
remote_start(Node) ->
spawn(Node,?MODULE,local_spawn,[]).
local_spawn() ->
% spawn without link so start_wait_stop will survive to
% the death of local_spawn process
spawn(fun start_wait_stop/0).
start_wait_stop() ->
start(),
receive
stop -> ok
end.
start() ->
io:format("start (~p)~n",[self()]),
{ok, Pid} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
{ok, Pid}.
init([]) ->
io:format("I'm on node ~p~n", [node()]),
io:format("My pid is ~p~n", [self()]),
{ok, {{one_for_one, 1, 5}, []}}.
in the shell you get in node 1
(p1#W7FRR00423L)1> net_kernel:connect_node(p2#W7FRR00423L).
true
(p1#W7FRR00423L)2> c(remote_process).
{ok,remote_process}
(p1#W7FRR00423L)3> global:whereis_name(remote_process).
undefined
(p1#W7FRR00423L)4> remote_process:remote_start(p2#W7FRR00423L).
<7080.68.0>
start (<7080.69.0>)
I'm on node p2#W7FRR00423L
My pid is <7080.70.0>
(p1#W7FRR00423L)5> global:whereis_name(remote_process).
<7080.70.0>
(p1#W7FRR00423L)6> global:whereis_name(remote_process).
undefined
and in node 2
(p2#W7FRR00423L)1> global:registered_names(). % before step 4
[]
(p2#W7FRR00423L)2> global:registered_names(). % after step 4
[remote_process]
(p2#W7FRR00423L)3> rp(processes()).
[<0.0.0>,<0.1.0>,<0.4.0>,<0.30.0>,<0.31.0>,<0.33.0>,
<0.34.0>,<0.35.0>,<0.36.0>,<0.37.0>,<0.38.0>,<0.39.0>,
<0.40.0>,<0.41.0>,<0.42.0>,<0.43.0>,<0.44.0>,<0.45.0>,
<0.46.0>,<0.47.0>,<0.48.0>,<0.49.0>,<0.50.0>,<0.51.0>,
<0.52.0>,<0.53.0>,<0.54.0>,<0.55.0>,<0.56.0>,<0.57.0>,
<0.58.0>,<0.62.0>,<0.64.0>,<0.69.0>,<0.70.0>]
ok
(p2#W7FRR00423L)4> pid(0,69,0) ! stop. % between steps 5 and 6
stop
(p2#W7FRR00423L)5> global:registered_names().
[]

Erlang simple_one_for_one supervisor does not restart child

I have a test module and a simple_one_for_one supervisor.
test.erl
-module(test).
-export([
run/1,
do_job/1
]).
run(Fun) ->
test_sup:start_child([Fun]).
do_job(Fun) ->
Pid = spawn(Fun),
io:format("started ~p~n", [Pid]),
{ok, Pid}.
test_sup.erl
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
-export([start_child/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init(_Args) ->
SupFlags = #{strategy => simple_one_for_one, intensity => 2, period => 20},
ChildSpecs = [#{id => test,
start => {test, do_job, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [test]}],
{ok, {SupFlags, ChildSpecs}}.
start_child(Args) ->
supervisor:start_child(?MODULE, Args).
I start supervisor in shell by command test_sup:start_link(). After that i run this command: test:run(fun() -> erlang:throw(err) end). I except the function do_job restart 2times but it never does. What is the problem?
Here is shell:
1> test_sup:start_link().
{ok,<0.36.0>}
2> test:run(fun() -> erlang:throw(err) end).
started <0.38.0>
{ok,<0.38.0>}
3>
=ERROR REPORT==== 16-Dec-2016::22:08:41 ===
Error in process <0.38.0> with exit value:
{{nocatch,err},[{erlang,apply,2,[]}]}
Restarting children is contrary to how simple_one_for_one supervisors are defined. Per the supervisor docs:
Functions delete_child/2 and restart_child/2 are invalid for simple_one_for_one supervisors and return {error,simple_one_for_one} if the specified supervisor uses this restart strategy.
In other words, what you're asking for can never happen. That's because a simple_one_for_one is intended for dynamic children that are defined on the fly by passing in additional startup args when you request the child. Other supervisors are able to restart their children because the startup args are statically defined in the supervisor.
Basically, this type of supervisor is strictly for ensuring a tidy shutdown when you need to have a dynamic pool of workers.

Erlang supervisor does not restart child

I'm trying to learn about erlang supervisors. I have a simple printer process that prints hello every 3 seconds. I also have a supervisor that must restart the printer process if any exception occurs.
Here is my code:
test.erl:
-module(test).
-export([start_link/0]).
start_link() ->
io:format("started~n"),
Pid = spawn_link(fun() -> loop() end),
{ok, Pid}.
loop() ->
timer:sleep(3000),
io:format("hello~n"),
loop().
test_sup.erl:
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init(_Args) ->
SupFlags = #{strategy => one_for_one, intensity => 1, period => 5},
ChildSpecs = [#{id => test,
start => {test, start_link, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [test]}],
{ok, {SupFlags, ChildSpecs}}.
Now I run this program and start the supervisor using test_sup:start_link(). command and after a few seconds, I raise an exception. Why the supervisor does not restart the printer process?
Here is the shell output:
1> test_sup:start_link().
started
{ok,<0.36.0>}
hello
hello
hello
hello
2> erlang:error(err).
=ERROR REPORT==== 13-Dec-2016::00:57:10 ===
** Generic server test_sup terminating
** Last message in was {'EXIT',<0.34.0>,
{err,
[{erl_eval,do_apply,6,
[{file,"erl_eval.erl"},{line,674}]},
{shell,exprs,7,
[{file,"shell.erl"},{line,686}]},
{shell,eval_exprs,7,
[{file,"shell.erl"},{line,641}]},
{shell,eval_loop,3,
[{file,"shell.erl"},{line,626}]}]}}
** When Server state == {state,
{local,test_sup},
one_for_one,
[{child,<0.37.0>,test,
{test,start_link,[]},
permanent,brutal_kill,worker,
[test]}],
undefined,1,5,[],0,test_sup,[]}
** Reason for termination ==
** {err,[{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{shell,exprs,7,[{file,"shell.erl"},{line,686}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,641}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,626}]}]}
** exception error: err
Here's the architecture you've created with your files:
test_sup (supervisor)
^
|
v
test (worker)
Then you start your supervisor by calling start_link() in the shell. This creates another bidirectional link:
shell
^
|
v
test_sup (supervisor)
^
|
v
test (worker)
With a bidirectional link, if either side dies, the other side is killed.
When you run erlang:error, you're causing an error in your shell!
Your shell is linked to your supervisor, so Erlang kills the supervisor in response. By chain reaction, your worker gets killed too.
I think you intended to send the error condition to your worker rather than the shell:
Determine the Pid of your worker: supervisor:which_children
Call erlang:exit(Pid, Reason) on the worker's Pid.
When you execute erlang:error(err)., you are killing the calling process, your shell.
As you have used start_link to start the supervisor, it is also killed, and the loop also.
The shell is automatically restarted (thanks to some supervisor), but nobody restart your test supervisor, which cannot restart the loop.
To make this test you should do:
in module test:
start_link() ->
Pid = spawn_link(fun() -> loop() end),
io:format("started ~p~n",[Pid]),
{ok, Pid}.
you will get a prompt:
started <0,xx,0>
where <0,xx,0> is the loop pid, and in the shell you can call
exit(pid(0,xx,0), err).
to kill the loop only.

Supervisor crashes when worker does, where Supervisor shouldn't

I have an Erlang supervisor that supervises a process of a worker-server based on gen_server, I start form the shell my supervisor, which in turn starts my worker-server with no problems, It looks like this:
start_link() ->
supervisor:start_link({local, ?SERVER}, ?MODULE, []).
But I when I crash my worker server, my supervisor crashes with it for unknown reason.
I found on the internet a fix for this, I use this:
start_link_shell() ->
{ok,Pid} = supervisor:start_link({local, ?SERVER}, ?MODULE, []),
unlink(Pid).
Now it works fine, but I don't understand why, Can anyone explain ?
**
Update
**
This is my init function
%%%===================================================================
init([]) ->
% Building Supervisor specifications
RestartStrategy = one_for_one,
MaxRestarts = 2,
MaxSecondsBetweenRestarts = 5000,
SupFlags = {RestartStrategy, MaxRestarts, MaxSecondsBetweenRestarts},
% Building Child specifications
Restart = permanent,
Shutdown = 2000, % Number of seconds the child is allowed to run after receiving shutdown message
Type = worker,
ChildSpec = {'db_server',
{'db_server', start_link, []},
Restart,
Shutdown,
Type,
['db_server']},
% Putting Supervisor and Child(ren) specifications in the return
{ok, {SupFlags, [ChildSpec]}}.
As per this link:
The problem testing supervisors from the shell is that the supervisor process is linked to the shell process. When gen_server process crashes the exit signal is propagated up to the shell which crashes and get restarted .. and that will be used for testing only, otherwise, the OTP application should start the supervisor and get linked to it.

Resources