I'm trying implement a simple supervisor and just have it restart child processes if they fail. But, I don't even know how to spawn more than one process under a supervisor! I looked at simple supervisor code on this site and found something
-module(echo_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
{ok, Pid} = supervisor:start_link(echo_sup, []),
unlink(Pid).
init(_Args) ->
{ok, {{one_for_one, 5, 60},
[{echo_server, {echo_server, start_link, []},
permanent, brutal_kill, worker, [echo_server]},
{echo_server2, {echo_server2, start_link, []},
permanent, brutal_kill, worker, [echo_server2]}]}}.
I assumed that putting "echo_server2" part in the init() function would spawn another process under this supervisor, but I end up getting an exception exit:shutdown message.
Both the files "echo_server" and "echo_server2" are the same code but different names. So I'm just confused now.
-module(echo_server2).
-behaviour(gen_server).
-export([start_link/0]).
-export([echo/1, crash/0]).
-export([init/1, handle_call/3, handle_cast/2]).
start_link() ->
{ok,Pid} = gen_server:start_link({local, echo_server2}, echo_server2, [], []),
unlink(Pid).
%% public api
echo(Text) ->
gen_server:call(echo_server2, {echo, Text}).
crash() ->
gen_server:call(echo_server2, crash).
%% behaviours
init(_Args) ->
{ok, none}.
handle_call(crash, _From, State) ->
X=1,
{reply, X=2, State};
handle_call({echo, Text}, _From, State) ->
{reply, Text, State}.
handle_cast(_, State) ->
{noreply, State}.
First you need read some docs about OTP/gen_server and OTP/supervisors. You have few errors in your code.
1) In echo_sup module change your start_link function as follow:
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
Dont know why do you unlink/1 after process has been started.
2) In both echo_servers change start_link function to:
start_link() ->
gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
You should not to change return value of this function, because supervisor expect one of this values:
{ok,Pid} | ignore | {error,Error}
You don't need two different modules just to run two instances of the same server. The conflict problem is due to the tag in the child specification which has to be unique. It is the first element in the tuple. So you could have something like:
[{echo_server, {echo_server, start_link, []},
permanent, brutal_kill, worker, [echo_server]},
{echo_server2, {echo_server, start_link, []},
permanent, brutal_kill, worker, [echo_server]}]}}.
Why do you unlink the child processes? The supervisor uses these links to supervise its children. The error you are getting is that the supervisor expects the functions which start the children to return {ok,ChildPid}, this is how it gets the pid of the children, so when it gets another return value it fails the startup of the children and then gives up itself. All according to how it is supposed to work.
If you want to register both servers then you could modify the start_link function to take the name to use as an argument and pass so you can explicitly pass it in through the child spec. So:
start_link(Name) ->
gen_server:start_link({local, Name}, ?MODULE, [], []).
and
[{echo_server, {echo_server, start_link, [echo_server]},
permanent, brutal_kill, worker, [echo_server]},
{echo_server2, {echo_server, start_link, [echo_server2]},
permanent, brutal_kill, worker, [echo_server]}]}}.
Using the module name as the registered name for a server is just a convention which only works if you run one instance of the server.
Related
I'm working on building a supervisor in Erlang that looks like this:
-module(a_sup).
-behaviour(supervisor).
%% API
-export([start_link/0, init/1]).
start_link() ->
{ok, supervisor:start_link({local,?MODULE}, ?MODULE, [])}.
init(_Args) ->
RestartStrategy = {simple_one_for_one, 5, 3600},
ChildSpec = {
a_gen_server,
{a_gen_server, start_link, []},
permanent,
brutal_kill,
worker,
[a_gen_server]
},
{ok, {RestartStrategy,[ChildSpec]}}.
And this is how my gen_server looks:
-module(a_gen_server).
-behavior(gen_server).
%% API
-export([start_link/2, init/1]).
start_link(Name, {X, Y}) ->
gen_server:start_link({local, Name}, ?MODULE, [Name, {X,Y}], []),
ok.
init([Name, {X,Y}]) ->
process_flag(trap_exit, true),
io:format("~p: position {~p,~p}~n",[Name, X, Y]),
{ok, {X,Y}}.
My gen_server works completely fine. When I run the supervisor as:
1> c(a_sup).
{ok,a_sup}
2> Pid = a_sup:start_link().
{ok,{ok,<0.85.0>}}
3> supervisor:start_child(a_sup, [Hello, {4,3}]).
Hello: position {4,3}
{error,ok}
I couldn't understand where the {error, ok} is coming from, and if there is an error, then what is causing it. So this is what I get when I check the status of the children:
> supervisor:count_children(a_sup).
[{specs,1},{active,0},{supervisors,0},{workers,0}]
This means that there are no children registered with the supervisor yet despite it calling the init method of the gen_server and spawning a process? Clearly there is some error preventing the method to complete successfully but I can't seem to gather any hints to figure it out.
The problem is that a_gen_server:start_link (since that's the function used to in the child spec) is expected to return {ok, Pid}, not just ok.
As the docs put it:
The start function must create and link to the child process, and must
return {ok,Child} or {ok,Child,Info}, where Child is the pid of the
child process and Info any term that is ignored by the supervisor.
The start function can also return ignore if the child process for
some reason cannot be started, in which case the child specification
is kept by the supervisor (unless it is a temporary child) but the
non-existing child process is ignored.
If something goes wrong, the function can also return an error tuple
{error,Error}.
I am running Erlang R16B03-1 (erts-5.10.4) at OS X 10.9.2. Erlang was installed by using brew.
And I am trying to run a gen_server module.
-module(logger).
-author("evangelosp").
-behaviour(gen_server).
%% API
-export([start/0, stop/0, log/2]).
%% gen_server callbacks
-export([init/1,
handle_call/3,
handle_cast/2,
handle_info/2,
terminate/2,
code_change/3]).
-define(SERVER, ?MODULE).
%%%===================================================================
%%% API
%%%===================================================================
start() -> gen_server:start_link({global, ?SERVER}, ?MODULE, [], []).
stop() -> gen_server:call(?MODULE, stop).
log(_Level, _MSG) -> gen_server:call(?MODULE, {add, {_Level, _MSG}}).
%%%===================================================================
%%% gen_server callbacks
%%%===================================================================
init([]) -> {ok, ets:new(?MODULE, [])}.
handle_call(_Request, _From, Table) -> {reply, {ok, ["Mplah!", _Request, _From, Table]}, Table}.
handle_cast(_Request, State) -> {noreply, State}.
handle_info(_Info, State) -> {noreply, State}.
terminate(_Reason, _State) -> ok.
code_change(_OldVsn, State, _Extra) -> {ok, State}.
In the erlang shell I am running:
Eshell V5.10.4 (abort with ^G)
1> c(logger).
{ok,logger}
2> logger:start().
{ok,<0.40.0>}
3> logger:log(info, "Hello World").
** exception exit: {noproc,{gen_server,call,
[logger,{add,{info,"Hello World"}}]}}
in function gen_server:call/2 (gen_server.erl, line 180)
And I can't get rid of that exception. I haven't actually found any useful resource by looking up the exception message, but this which didn't help much.
Cheers.
In you code start() -> gen_server:start_link({global, ?SERVER}, ?MODULE, [], [])., you use {global, ?SERVER} which means that:
If ServerName={global,GlobalName} the gen_server is registered
globally as GlobalName using global:register_name/2.
So when you send message to the server, you should write log(_Level, _MSG) -> gen_server:call({global, ?MODULE}, {add, {_Level, _MSG}}).. Please see the erlang doc:
call(ServerRef, Request, Timeout) -> Reply
Types:
ServerRef = Name | {Name,Node} | {global,GlobalName} |
ServerRef can be:
Name, if the gen_server is locally registered,
{global,GlobalName}, if the gen_server is globally registered.
Your server is not registered, so it is accesible only by its pid. But the interface functions use the implicit macro ?MODULE (which is replaced by logger at compilation) to access it.
You need either to change your interface functions, or, to register the server in the start function:
start() ->
{ok,Pid} = gen_server:start_link({global, ?SERVER}, ?MODULE, [], []),
register(?MODULE,Pid).
[edit] Thanks Evalon, I made the correction in the answer :o)
The supervisor is an OTP behavior.
init([]) ->
RoomSpec = {mod_zytm_room, {mod_zytm_room, start_link, []},
transient, brutal_kill, worker, [mod_zytm_room]},
{ok, {{simple_one_for_one, 10, 10000}, [RoomSpec]}}.
Above code will invoke child's terminate method.
But if I change the brutal_kill to an integer timeout (e.g. 6000), the terminate method was never invoked.
I see an explanation in the Erlang document:
The dynamically created child processes of a simple-one-for-one
supervisor are not explicitly killed, regardless of shutdown strategy,
but are expected to terminate when the supervisor does (that is, when
an exit signal from the parent process is received).
But I cannot fully understand. Is it said that exit(Pid, kill) can terminate a simple_one_for_one child spec while exit(Pid, shutdown) can't ?
===================================update====================================
mod_zytm_room_sup.erl
-module(mod_zytm_room_sup).
-behaviour(supervisor).
-export([start_link/0, init/1, open_room/1, close_room/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
RoomSpec = {mod_zytm_room, {mod_zytm_room, start_link, []},
transient, brutal_kill, worker, [mod_zytm_room]},
{ok, {{simple_one_for_one, 10, 10000}, [RoomSpec]}}.
open_room(RoomId) ->
supervisor:start_child(?MODULE, [RoomId]).
close_room(RoomPid) ->
supervisor:terminate_child(?MODULE, RoomPid).
mod_zytm_room.erl
-module(mod_zytm_room).
-behaviour(gen_server).
-export([start_link/1]).
-export([init/1, handle_cast/2, handle_info/2, handle_call/3, code_change/3, terminate/2]).
start_link(RoomId) ->
gen_server:start_link(?MODULE, [RoomId], []).
init([RoomId]) ->
{ok, []}.
terminate(_, _) ->
error_logger:info_msg("~p terminated:~p", [?MODULE, self()]),
ok.
...other methods ommited.
mod_zytm_sup.erl
-module(mod_zytm_sup).
-behaviour(gen_server).
-export([start_link/0]).
-export([init/1, handle_cast/2, handle_info/2, handle_call/3, code_change/3, terminate/2]).
start_link() ->
gen_server:start_link(?MODULE, [], []).
init([]) ->
{ok, []}.
%% invoked by an erlang:send_after event.
handle_info({'CLOSE_ROOM', RoomPid}, State) ->
mod_zytm_room_sup:close_room(RoomPid),
{noreply, State}.
...other methods ommited.
Both mod_zytm_sup and mod_zytm_room_sup are a part of a system supervision tree, mod_zytm_sup invoke mod_zytm_room_sup to create or close mod_zytm_room process.
Sorry I've got wrong result.
To make it clear:
brutal_kill strategy kill child process immediately.
The terminate method will be invoked if the simple_one_for_one's shutdown strategy is an integer timeout. The child must declare process_flag(trap_exit, true) in its init callback.
FYI, Manual on Erlang doc:
If the gen_server is part of a supervision tree and is ordered by its
supervisor to terminate, this function will be called with
Reason=shutdown if the following conditions apply:
the gen_server has been set to trap exit signals, and the shutdown
strategy as defined in the supervisor's child specification is an
integer timeout value, not brutal_kill.
The dynamically created child processes of a simple-one-for-one
supervisor are not explicitly killed, regardless of shutdown strategy,
but are expected to terminate when the supervisor does (that is, when
an exit signal from the parent process is received).
Note that this is no longer true. Since Erlang/OTP R15A, dynamic children are explicitly terminated as per the shutdown strategy.
I have a server which is created like this :
gateway.erl (supervisor of the supervisor) -> gateway_sup.erl (supervisor of gen_servers) -> gateway_serv.erl (Where every client are handled).
This is pretty basic as I saw over the internet, most people are doing like this.
The Listen Socket is created on the gateway_sup.erl, and I would like to listen over multiple sockets in case some client port restrictions.
So here's my code so far.
gateway.erl
-export([start_link/0, init/1, startWithPort/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
spawn_link(?MODULE, startWithPort, [8080]),
spawn_link(?MODULE, startWithPort, [443]),
{ok, {{simple_one_for_one, 3600, 3600},
[{socket,
{gateway_sup, start_link, []},
permanent, 1000, supervisor, [gateway_sup]}
]}}.
startWithPort(Port) ->
io:fwrite("Starting...: ~p~n", [Port]),
supervisor:start_child(?MODULE, [Port]).
gateway_sup.erl
-export([start_socket/0, init/1, start_link/1]).
start_link(Port) ->
io:fwrite("gateway_sup start_link Port: ~p // ~p~n", [list_to_atom(atom_to_list(?MODULE) ++ atom_to_list(Port)), Port])
supervisor:start_link({local, list_to_atom(atom_to_list(?MODULE) ++ atom_to_list(Port))}, ?MODULE, [Port]).
init([Port]) ->
io:fwrite("gateway_sup Init with port: ~p~n", [Port]),
R = ssl:listen(Port, ?SSL_OPTIONS),
{ok, LSocket} = R,
spawn_link(fun empty_listeners/0),
{ok, {{simple_one_for_one, 3600, 3600},
[{socket,
{gateway_serv, start_link, [LSocket]},
temporary, 1000, worker, [gateway_serv]}
]}}.
empty_listeners() ->
[start_socket() || _ <- lists:seq(1,128)],
ok.
start_socket() ->
supervisor:start_child(?MODULE, []).
the start_link() function on gateway_sup.erl is never called.
If the gateway is one_for_one and I'm not trying to pass a parameter, everything works fine, but I only do listen over one hardcoded port.
I can't see why it won't call the gateway_sup:start_link/1 ?
Ok, found it ! took me over a night for such minor mistake !
The error is when I'm creating the role within the start_link
list_to_atom(atom_to_list(?MODULE) ++ atom_to_list(Port))
has been replaced with
list_to_atom(atom_to_list(?MODULE) ++ lists:flatten(io_lib:format("~B", [Port])))
I need to keep a gen_mod process running as it loops every minute and does some cleanup. However once every few days it will crash and I'll have to manually start it back up again.
I could use a basic example of implementing a supervisor into ejabberd_sup so it can keep going. I am struggling to understand the examples that use gen_server.
Thanks for the help.
Here's an example module combining ejabberd's gen_mod and OTP's gen_server. Explanation is inlined in the code.
-module(cleaner).
-behaviour(gen_server).
-behaviour(gen_mod).
%% gen_mod requires these exports
-export([start/2, stop/1]).
%% these are exports for gen_server
-export([start_link/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
terminate/2, code_change/3]).
-define(INTERVAL, timer:minutes(1)).
-record(state, {}).
%% ejabberd calls this function when this module is loaded
%% basically it adds gen_server defined by this module to
%% ejabberd main supervisor
start(Host, Opts) ->
Proc = gen_mod:get_module_proc(Host, ?MODULE),
ChildSpec = {Proc,
{?MODULE, start_link, [Host, Opts]},
permanent,
1000,
worker,
[?MODULE]},
supervisor:start_child(ejabberd_sup, ChildSpec).
%% this is called by ejabberd when module is unloaded, so it
%% does the opposite of start/2 :)
stop(Host) ->
Proc = gen_mod:get_module_proc(Host, ?MODULE),
supervisor:terminate_child(ejabberd_sup, Proc),
supervisor:delete_child(ejabberd_sup, Proc).
%% it will be called by supervisor when it is time to start
%% this gen_server under control of supervisor
start_link(_Host, _Opts) ->
gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
%% it is an initialization function for gen_server
%% it starts a timer, which sends 'tick' message periodically to itself
init(_) ->
timer:send_interval(?INTERVAL, self(), tick),
{ok, #state{}}.
handle_call(_Request, _From, State) ->
Reply = ok,
{reply, Reply, State}.
handle_cast(_Msg, State) ->
{noreply, State}.
%% this function is called whenever gen_server receives a 'tick' message
handle_info(tick, State) ->
State2 = do_cleanup(State),
{noreply, State2};
handle_info(_Info, State) ->
{noreply, State}.
terminate(_Reason, _State) ->
ok.
code_change(_OldVsn, State, _Extra) ->
{ok, State}.
%% this function is called by handle_info/2 when tick message is received
%% so put all cleanup code here
do_cleanup(State) ->
%% do all cleanup work here
State.
This blog post gives a good explanation how gen_servers work. Of course make sure to re-read OTP design principles on gen_server and on supervisor.
Ejabberd's module developement is described here