Erlang simple_one_for_one supervisor does not restart child

Erlang simple_one_for_one supervisor does not restart child - erlang

I have a test module and a simple_one_for_one supervisor.
test.erl
-module(test).
-export([
run/1,
do_job/1
]).
run(Fun) ->
test_sup:start_child([Fun]).
do_job(Fun) ->
Pid = spawn(Fun),
io:format("started ~p~n", [Pid]),
{ok, Pid}.
test_sup.erl
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
-export([start_child/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init(_Args) ->
SupFlags = #{strategy => simple_one_for_one, intensity => 2, period => 20},
ChildSpecs = [#{id => test,
start => {test, do_job, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [test]}],
{ok, {SupFlags, ChildSpecs}}.
start_child(Args) ->
supervisor:start_child(?MODULE, Args).
I start supervisor in shell by command test_sup:start_link(). After that i run this command: test:run(fun() -> erlang:throw(err) end). I except the function do_job restart 2times but it never does. What is the problem?
Here is shell:
1> test_sup:start_link().
{ok,<0.36.0>}
2> test:run(fun() -> erlang:throw(err) end).
started <0.38.0>
{ok,<0.38.0>}
3>
=ERROR REPORT==== 16-Dec-2016::22:08:41 ===
Error in process <0.38.0> with exit value:
{{nocatch,err},[{erlang,apply,2,[]}]}

Restarting children is contrary to how simple_one_for_one supervisors are defined. Per the supervisor docs:
Functions delete_child/2 and restart_child/2 are invalid for simple_one_for_one supervisors and return {error,simple_one_for_one} if the specified supervisor uses this restart strategy.
In other words, what you're asking for can never happen. That's because a simple_one_for_one is intended for dynamic children that are defined on the fly by passing in additional startup args when you request the child. Other supervisors are able to restart their children because the startup args are statically defined in the supervisor.
Basically, this type of supervisor is strictly for ensuring a tidy shutdown when you need to have a dynamic pool of workers.

Related

Can't start process in erlang node

I have two erlang nodes, node01 is 'vm01#192.168.146.128', node02 is 'vm02#192.168.146.128'. I want to start one process on node01 by using spawn(Node, Mod, Fun, Args) on node02, but I always get useless pid.
Node connection is ok:
(vm02#192.168.146.128)14> net_adm:ping('vm01#192.168.146.128').
pong
Module is in the path of node01 and node02:
(vm01#192.168.146.128)7> m(remote_process).
Module: remote_process
MD5: 99784aa56b4feb2f5feed49314940e50
Compiled: No compile time info available
Object file: /src/remote_process.beam
Compiler options: []
Exports:
init/1
module_info/0
module_info/1
start/0
ok
(vm02#192.168.146.128)20> m(remote_process).
Module: remote_process
MD5: 99784aa56b4feb2f5feed49314940e50
Compiled: No compile time info available
Object file: /src/remote_process.beam
Compiler options: []
Exports:
init/1
module_info/0
module_info/1
start/0
ok
However, the spawn is not successful:
(vm02#192.168.146.128)21> spawn('vm01#192.168.146.128', remote_process, start, []).
I'm on node 'vm01#192.168.146.128'
<9981.89.0>
My pid is <9981.90.0>
(vm01#192.168.146.128)8> whereis(remote_process).
undefined
The process is able to run on local node:
(vm02#192.168.146.128)18> remote_process:start().
I'm on node 'vm02#192.168.146.128'
My pid is <0.108.0>
{ok,<0.108.0>}
(vm02#192.168.146.128)24> whereis(remote_process).
<0.115.0>
But it fails on remote node. Can anyone give me some idea?
Here is the source code remote_process.erl:
-module(remote_process).
-behaviour(supervisor).
-export([start/0, init/1]).
start() ->
{ok, Pid} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
{ok, Pid}.
init([]) ->
io:format("I'm on node ~p~n", [node()]),
io:format("My pid is ~p~n", [self()]),
{ok, {{one_for_one, 1, 5}, []}}.

You are using a global registration for your process, it is necessary for your purpose. The function to retrieve it is global:whereis_name(remote_process).
Edit : It works if
the 2 nodes are connected (check with nodes())
the process is registered with the global module
the process is still alive
if any of these conditions is not satisfied you will get undefined
Edit 2: start node 1 with : werl -sname p1 and type in the shell :
(p1#W7FRR00423L)1> c(remote_process).
{ok,remote_process}
(p1#W7FRR00423L)2> remote_process:start().
I'm on node p1#W7FRR00423L
My pid is <0.69.0>
{ok,<0.69.0>}
(p1#W7FRR00423L)3> global:whereis_name(remote_process).
<0.69.0>
(p1#W7FRR00423L)4>
then start a second node with werl - sname p2 and type in the shell (it is ok to connect the second node later, the global registration is "updated" when necessary):
(p2#W7FRR00423L)1> net_kernel:connect_node(p1#W7FRR00423L).
true
(p2#W7FRR00423L)2> nodes().
[p1#W7FRR00423L]
(p2#W7FRR00423L)3> global:whereis_name(remote_process).
<7080.69.0>
(p2#W7FRR00423L)4>
(p2#W7FRR00423L)4>
Edit 3:
In your test you are spawning a process P1 on the remote node which executes the function remote_process:start/0.
This function calls supervisor:start_link/3 which basically spawns a new supervisor process P2 and links itself to it. after this, P1 has nothing to do anymore so it dies, causing the linked process P2 to die too and you get an undefined reply to the global:whereis_name call.
In my test, I start the process from the shell of the remote node; the shell does not die after I evaluate remote_process:start/0, so the supervisor process does not die and global:whereis_name find the requested pid.
If you want that the supervisor survive to the call, you need an intermediate process that will be spawned without link, so it will not die with its parent. I give you a small example based on your code:
-module(remote_process).
-behaviour(supervisor).
-export([start/0, init/1,local_spawn/0,remote_start/1]).
remote_start(Node) ->
spawn(Node,?MODULE,local_spawn,[]).
local_spawn() ->
% spawn without link so start_wait_stop will survive to
% the death of local_spawn process
spawn(fun start_wait_stop/0).
start_wait_stop() ->
start(),
receive
stop -> ok
end.
start() ->
io:format("start (~p)~n",[self()]),
{ok, Pid} = supervisor:start_link({global, ?MODULE}, ?MODULE, []),
{ok, Pid}.
init([]) ->
io:format("I'm on node ~p~n", [node()]),
io:format("My pid is ~p~n", [self()]),
{ok, {{one_for_one, 1, 5}, []}}.
in the shell you get in node 1
(p1#W7FRR00423L)1> net_kernel:connect_node(p2#W7FRR00423L).
true
(p1#W7FRR00423L)2> c(remote_process).
{ok,remote_process}
(p1#W7FRR00423L)3> global:whereis_name(remote_process).
undefined
(p1#W7FRR00423L)4> remote_process:remote_start(p2#W7FRR00423L).
<7080.68.0>
start (<7080.69.0>)
I'm on node p2#W7FRR00423L
My pid is <7080.70.0>
(p1#W7FRR00423L)5> global:whereis_name(remote_process).
<7080.70.0>
(p1#W7FRR00423L)6> global:whereis_name(remote_process).
undefined
and in node 2
(p2#W7FRR00423L)1> global:registered_names(). % before step 4
[]
(p2#W7FRR00423L)2> global:registered_names(). % after step 4
[remote_process]
(p2#W7FRR00423L)3> rp(processes()).
[<0.0.0>,<0.1.0>,<0.4.0>,<0.30.0>,<0.31.0>,<0.33.0>,
<0.34.0>,<0.35.0>,<0.36.0>,<0.37.0>,<0.38.0>,<0.39.0>,
<0.40.0>,<0.41.0>,<0.42.0>,<0.43.0>,<0.44.0>,<0.45.0>,
<0.46.0>,<0.47.0>,<0.48.0>,<0.49.0>,<0.50.0>,<0.51.0>,
<0.52.0>,<0.53.0>,<0.54.0>,<0.55.0>,<0.56.0>,<0.57.0>,
<0.58.0>,<0.62.0>,<0.64.0>,<0.69.0>,<0.70.0>]
ok
(p2#W7FRR00423L)4> pid(0,69,0) ! stop. % between steps 5 and 6
stop
(p2#W7FRR00423L)5> global:registered_names().
[]

Erlang supervisor shutdowns after running child

I have a test module and a one_for_one supervisor.
test.erl
-module(test).
-export([do_job/1,run/2, start_worker/1]).
run(Id, Fun) ->
test_sup:start_child(Id, [Fun]).
do_job(Fun) ->
Fun().
start_worker(Args) ->
Pid = spawn_link(test, do_job, Args),
io:format("started ~p~n",[Pid]),
{ok, Pid}.
test_sup.erl
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
-export([start_child/2]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init(_Args) ->
SupFlags = #{strategy => one_for_one, intensity => 2, period => 20},
{ok, {SupFlags, []}}.
start_child(Id, Args) ->
ChildSpecs = #{id => Id,
start => {test, start_worker, [Args]},
restart => transient,
shutdown => brutal_kill,
type => worker,
modules => [test]},
supervisor:start_child(?MODULE, ChildSpecs).
Now i start supervisor in shell and run command test:run(id, fun() -> erlang:throw(err) end). It works nice and the function start_worker/1 restart three times but after that, an exception occurs and the supervisor process shutdowns and i must start it manually with command test_sup:start_link(). What is the problem?
Shell:
1> test_sup:start_link().
{ok,<0.36.0>}
2> test:run(id, fun() -> erlang:throw(err) end).
started <0.38.0>
started <0.39.0>
started <0.40.0>
{ok,<0.38.0>}
=ERROR REPORT==== 16-Dec-2016::23:31:50 ===
Error in process <0.38.0> with exit value:
{{nocatch,err},[{test,do_job,1,[]}]}
=ERROR REPORT==== 16-Dec-2016::23:31:50 ===
Error in process <0.39.0> with exit value:
{{nocatch,err},[{test,do_job,1,[]}]}
=ERROR REPORT==== 16-Dec-2016::23:31:50 ===
Error in process <0.40.0> with exit value:
{{nocatch,err},[{test,do_job,1,[]}]}
** exception exit: shutdown

What is the problem?
There is no "problem". It's working exactly as you told it to:
To prevent a supervisor from getting into an infinite loop of child process terminations and restarts, a maximum restart intensity is defined using two integer values specified with keys intensity and period in the above map. Assuming the values MaxR for intensity and MaxT for period, then, if more than MaxR restarts occur within MaxT seconds, the supervisor terminates all child processes and then itself.
Your supervisor's configuration says, "If I have to restart a child more than two times (intensity) in 20 seconds (period), then something is wrong, so just shut down." As to why you have to restart the supervisor manually, it's because your supervisor isn't supervised itself. Otherwise, the supervisor's supervisor might try to restart it based on its own configuration.

Erlang supervisor does not restart child

I'm trying to learn about erlang supervisors. I have a simple printer process that prints hello every 3 seconds. I also have a supervisor that must restart the printer process if any exception occurs.
Here is my code:
test.erl:
-module(test).
-export([start_link/0]).
start_link() ->
io:format("started~n"),
Pid = spawn_link(fun() -> loop() end),
{ok, Pid}.
loop() ->
timer:sleep(3000),
io:format("hello~n"),
loop().
test_sup.erl:
-module(test_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init(_Args) ->
SupFlags = #{strategy => one_for_one, intensity => 1, period => 5},
ChildSpecs = [#{id => test,
start => {test, start_link, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [test]}],
{ok, {SupFlags, ChildSpecs}}.
Now I run this program and start the supervisor using test_sup:start_link(). command and after a few seconds, I raise an exception. Why the supervisor does not restart the printer process?
Here is the shell output:
1> test_sup:start_link().
started
{ok,<0.36.0>}
hello
hello
hello
hello
2> erlang:error(err).
=ERROR REPORT==== 13-Dec-2016::00:57:10 ===
** Generic server test_sup terminating
** Last message in was {'EXIT',<0.34.0>,
{err,
[{erl_eval,do_apply,6,
[{file,"erl_eval.erl"},{line,674}]},
{shell,exprs,7,
[{file,"shell.erl"},{line,686}]},
{shell,eval_exprs,7,
[{file,"shell.erl"},{line,641}]},
{shell,eval_loop,3,
[{file,"shell.erl"},{line,626}]}]}}
** When Server state == {state,
{local,test_sup},
one_for_one,
[{child,<0.37.0>,test,
{test,start_link,[]},
permanent,brutal_kill,worker,
[test]}],
undefined,1,5,[],0,test_sup,[]}
** Reason for termination ==
** {err,[{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{shell,exprs,7,[{file,"shell.erl"},{line,686}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,641}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,626}]}]}
** exception error: err

Here's the architecture you've created with your files:
test_sup (supervisor)
^
|
v
test (worker)
Then you start your supervisor by calling start_link() in the shell. This creates another bidirectional link:
shell
^
|
v
test_sup (supervisor)
^
|
v
test (worker)
With a bidirectional link, if either side dies, the other side is killed.
When you run erlang:error, you're causing an error in your shell!
Your shell is linked to your supervisor, so Erlang kills the supervisor in response. By chain reaction, your worker gets killed too.
I think you intended to send the error condition to your worker rather than the shell:
Determine the Pid of your worker: supervisor:which_children
Call erlang:exit(Pid, Reason) on the worker's Pid.

When you execute erlang:error(err)., you are killing the calling process, your shell.
As you have used start_link to start the supervisor, it is also killed, and the loop also.
The shell is automatically restarted (thanks to some supervisor), but nobody restart your test supervisor, which cannot restart the loop.
To make this test you should do:
in module test:
start_link() ->
Pid = spawn_link(fun() -> loop() end),
io:format("started ~p~n",[Pid]),
{ok, Pid}.
you will get a prompt:
started <0,xx,0>
where <0,xx,0> is the loop pid, and in the shell you can call
exit(pid(0,xx,0), err).
to kill the loop only.

Gen Server Error noproc

I get the following error when I try to run my program through shell using erlang.mk.
=INFO REPORT==== 5-May-2016::05:47:57 ===
application: rad
exited: {bad_return,
{{rad_app,start,[normal,[]]},
{'EXIT',
{noproc,{gen_server,call,[rad_config,{lookup,port}]}}}}}
type: permanent
Eshell V6.4 (abort with ^G)
(rad#127.0.0.1)1> {"Kernel pid terminated",application_controller,"{application_start_failure,rad,{bad_return,{{rad_app,start,[normal,[]]},{'EXIT',{noproc,{gen_server,call,[rad_config,{lookup,port}]}}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rad,{bad_return,{{rad_app,start,[normal,[]]},{'EXIT',{noproc,{gen_server,call,[radheart: Thu May 5 05:47:58 2016: _coErlang is crashing .. (waiting for crash dump file)nf
ig,{lookup,porheart: Thu May 5 05:47:58 2016: tWould reboot. Terminating.}
]}}}}}})
make: *** [run] Error 1
rad.app.src
{application, rad,
[
{description, "Awesome server written in Erlang"},
{vsn, "0.0.1"},
{registered, [rad_sup, rad_config]},
{modules, []},
{applications, [
kernel,
stdlib,
cowboy
]},
{mod, {rad_app, []}},
{env, []}
]}.
rad_config.erl
-module(rad_config).
-behaviour(gen_server).
%% API
-export([start_link/0]).
%% Gen Server Callbacks
-export([init/1 , handle_call/3 , handle_cast/2 , handle_info/2 , terminate/2 , code_change/3]).
-export([lookup/1]).
-define(SERVER, ?MODULE).
-record(state, {conf}).
start_link() ->
gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).
init([]) ->
{ok, Conf} = file:consult("config/rad.cfg"),
{ok, #state{conf = Conf}}.
handle_call({lookup, Tag} , _From , State) ->
Reply = case lists:keyfind(Tag, 1, State#state.conf) of
{Tag, Value} ->
Value;
false ->
{error, noinstance}
end,
{reply, Reply, State};
handle_call(_Request, _From, State) ->
Reply = ok,
{reply, Reply, State}.
handle_cast(_Msg , State) ->
{noreply, State}.
handle_info(_Info , State) ->
{noreply, State}.
terminate(Reason , _State) ->
io:format("~n Server shutdown. Reason: ~s.~n", [Reason]),
ok.
code_change(_OldVsn , State , _Extra) ->
{ok, State}.
lookup(Tag) ->
gen_server:call(?SERVER, {lookup, Tag}).
rad_app.erl
-module(rad_app).
-behaviour(application).
-export([start/2]).
-export([stop/1]).
%% First we need to define and compile the dispatch list, a list of routes
%% that Cowboy will use to map requests to handler modules. Then we tell
%% Cowboy to listen for connections.
start(_Type, _Args) ->
Port = rad_config:lookup(port),
Dispatch = cowboy_router:compile([
{'_', [{"/", route_handler, []}]}
]),
{ok, _} = cowboy:start_http(rad_http_listener, 100,
[{port, Port}], [{env, [{dispatch, Dispatch}]}]),
%% Start the supervisor
rad_sup:start_link().
stop(_State) ->
ok.
rad_sup.erl
-module(rad_sup).
-behaviour(supervisor).
%% API
-export([start_link/0]).
%% Supervisor callbacks
-export([init/1, shutdown/0]).
-define(SERVER, ?MODULE).
%% Helper macro for declaring children of supervisor
-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
%% ===================================================================
%% API functions
%% ===================================================================
start_link() ->
supervisor:start_link({local, ?SERVER}, ?MODULE, []).
%% ===================================================================
%% Supervisor callbacks
%% ===================================================================
init([]) ->
RestartStrategy = simple_one_for_one,
MaxRestarts = 10,
MaxSecondsBwRestarts = 5,
SupFlag = {RestartStrategy, MaxRestarts, MaxSecondsBwRestarts},
Processes = [?CHILD(rad_config, worker)],
{ok, {SupFlag, Processes}}.
%% Supervisor can be shutdown by calling exit(SupPid,shutdown)
%% or, if it's linked to its parent, by parent calling exit/1.
shutdown() ->
exit(whereis(?MODULE), shutdown).
So basically I've two questions related to the error that is thrown here:
Is this error thrown because my gen_server is not able to start?
The line in rad_config corresponding to file:consult/1, I want to ask from where does this function fetches the file as in the parameter that I've passed to it is config/rad.cfg but all the .erl files are stored in src folder. And both these folders src and config are at the same directory level. So, the parameter that I've passed to file:consult/1, is it correct? Although I've tried to pass the parameter as ../config/rad.cfg also. I still get the same error.
Please help me out. I'm new to Erlang and I'm trying to solve this error for quite some time. Btw, I using Erlang 17.5.

First, it seems like when you run rad_app.erl your rad_config server is not yet started. so when your get to this line:
Port = rad_config:lookup(port)
You are actually calling:
lookup(Tag) ->
gen_server:call(?SERVER, {lookup, Tag}).
And the gen_server is not started so you are getting a noproc error.
In addition to this, even if the server was started already, you are not able to make a gen_server:call to your self. The best way to handle a case that you want to send yourself an event is to open a new process using spawn and from inside the spawned process make the call.
Your should read more about gen_server and OTP.

Supervisor crashes when worker does, where Supervisor shouldn't

I have an Erlang supervisor that supervises a process of a worker-server based on gen_server, I start form the shell my supervisor, which in turn starts my worker-server with no problems, It looks like this:
start_link() ->
supervisor:start_link({local, ?SERVER}, ?MODULE, []).
But I when I crash my worker server, my supervisor crashes with it for unknown reason.
I found on the internet a fix for this, I use this:
start_link_shell() ->
{ok,Pid} = supervisor:start_link({local, ?SERVER}, ?MODULE, []),
unlink(Pid).
Now it works fine, but I don't understand why, Can anyone explain ?
**
Update
**
This is my init function
%%%===================================================================
init([]) ->
% Building Supervisor specifications
RestartStrategy = one_for_one,
MaxRestarts = 2,
MaxSecondsBetweenRestarts = 5000,
SupFlags = {RestartStrategy, MaxRestarts, MaxSecondsBetweenRestarts},
% Building Child specifications
Restart = permanent,
Shutdown = 2000, % Number of seconds the child is allowed to run after receiving shutdown message
Type = worker,
ChildSpec = {'db_server',
{'db_server', start_link, []},
Restart,
Shutdown,
Type,
['db_server']},
% Putting Supervisor and Child(ren) specifications in the return
{ok, {SupFlags, [ChildSpec]}}.

As per this link:
The problem testing supervisors from the shell is that the supervisor process is linked to the shell process. When gen_server process crashes the exit signal is propagated up to the shell which crashes and get restarted .. and that will be used for testing only, otherwise, the OTP application should start the supervisor and get linked to it.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Erlang simple_one_for_one supervisor does not restart child - erlang

Related

Can't start process in erlang node

Erlang supervisor shutdowns after running child

Erlang supervisor does not restart child

Gen Server Error noproc

Supervisor crashes when worker does, where Supervisor shouldn't

Categories

Resources