Related
I have a simple supervisor that looks like this
-module(a_sup).
-behaviour(supervisor).
%% API
-export([start_link/0, init/1]).
start_link() ->
supervisor:start_link({local,?MODULE}, ?MODULE, []).
init(_Args) ->
RestartStrategy = {simple_one_for_one, 5, 3600},
ChildSpec = {
a_gen_server,
{a_gen_server, start_link, []},
permanent,
brutal_kill,
worker,
[a_gen_server]
},
{ok, {RestartStrategy,[ChildSpec]}}.
When I run this on the shell, it works perfectly fine. But now I want to run different instances of this supervisor on different nodes, called foo and bar (started as erl -sname foo and erl -sname bar, from a separate node called main erl -sname main). This is how I try to initiate this rpc:call('foo#My-MacBook-Pro', a_sup, start_link, [])., but after replying with ok it immediately fails with this message
{ok,<9098.117.0>}
=ERROR REPORT==== 7-Mar-2022::16:05:45.416820 ===
** Generic server a_sup terminating
** Last message in was {'EXIT',<9098.116.0>,
{#Ref<0.3172713737.1597505552.87599>,return,
{ok,<9098.117.0>}}}
** When Server state == {state,
{local,a_sup},
simple_one_for_one,
{[a_gen_server],
#{a_gen_server =>
{child,undefined,a_gen_server,
{a_gen_server,start_link,[]},
permanent,false,brutal_kill,worker,
[a_gen_server]}}},
{maps,#{}},
5,3600,[],0,never,a_sup,[]}
** Reason for termination ==
** {#Ref<0.3172713737.1597505552.87599>,return,{ok,<9098.117.0>}}
(main#Prachis-MacBook-Pro)2> =CRASH REPORT==== 7-Mar-2022::16:05:45.416861 ===
crasher:
initial call: supervisor:a_sup/1
pid: <9098.117.0>
registered_name: a_sup
exception exit: {#Ref<0.3172713737.1597505552.87599>,return,
{ok,<9098.117.0>}}
in function gen_server:decode_msg/9 (gen_server.erl, line 481)
ancestors: [<9098.116.0>]
message_queue_len: 0
messages: []
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 610
stack_size: 29
reductions: 425
neighbours:
From the message it looks like the call expects the supervisor to be a gen_server instead? And when I try to initiat a gen_server on the node like this, it works out just fine, but not with supervisors. I can't seem to figure out if there's something different in trying to initiate supervisor on local/remote nodes, and if yes, what should we do to fix the issue?
As per #JoséM's suggestion, the supervisor in the remote node is also linked to the ephemeral RPC process. However since supervisor does not provide a start method, modifying the start_link() method as
start_link() ->
Pid = supervisor:start_link({local,?MODULE}, ?MODULE, []).
unlink(Pid),
{ok, Pid}.
solves the issue.
When I call mnesia:create_schema on startup, the program crashes.
If I run my program in ebin without releasing it, it works find.
The error log as follows:
=INFO REPORT==== 3-Jul-2013::09:44:06 ===
application: eddy
exited: {bad_return,
{{eddy_app,start,[normal,[]]},
{'EXIT',
{{badmatch,
{error,
{'EXIT',
{undef,
[{mnesia_backup,open_write,
["/home/cometeor/eddy/rel/eddy/Mnesia.eddy#127.0.0.1/eddy#127.0.0.1137284464686415846847780"],
[]},
{mnesia_bup,do_apply,4,
[{file,"mnesia_bup.erl"},{line,387}]},
{mnesia_bup,make_initial_backup,3,
[{file,"mnesia_bup.erl"},{line,378}]},
{mnesia_bup,create_schema,2,
[{file,"mnesia_bup.erl"},{line,348}]},
{eddy_database,start,0,
[{file,"src/eddy_database.erl"},{line,24}]},
{eddy_app,start,2,[{file,"src/eddy_app.erl"},{line,16}]},
{application_master,start_it_old,4,
[{file,"application_master.erl"},{line,274}]}]}}}},
[{eddy_database,start,0,
[{file,"src/eddy_database.erl"},{line,24}]},
{eddy_app,start,2,[{file,"src/eddy_app.erl"},{line,16}]},
{application_master,start_it_old,4,
[{file,"application_master.erl"},{line,274}]}]}}}}
Resolved.Must add
{app, mnesia, [{mod_cond, app}]},
to reltool.config.
I am trying to run a custom application but get multiple errors. I believe the main egs app gets an error because it calls the egs patch app which has an undefined type. I cant figure out how to get this working I have tried recompiling the code many times in regards to others with a similar problem but nothing seems to work. The cowboy start listener remains undefined.
This is the error I receive.
=CRASH REPORT==== 10-Apr-2013::21:02:00 ===
crasher:
initial call: application_master:init/4
pid: <0.106.0>
registered_name: []
exception exit: {bad_return,
{{egs_patch_app,start,[normal,[]]},
{'EXIT',
{undef,
[{cowboy,start_listener,
[{patch,11030},
10,cowboy_tcp_transport,
[{port,11030}],
egs_patch_protocol,[]],
[]},
{egs_patch_app,start_listeners,1,
[{file,"src/egs_patch_app.erl"},
{line,44}]},
{egs_patch_app,start,2,
[{file,"src/egs_patch_app.erl"},
{line,31}]},
{application_master,start_it_old,4,
[{file,"application_master.erl"},
{line,274}]}]}}}}
in function application_master:init/4 (application_master.erl, line 138)
ancestors: [<0.105.0>]
messages: [{'EXIT',<0.107.0>,normal}]
links: [<0.105.0>,<0.7.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 610
stack_size: 27
reductions: 124
neighbours:
=INFO REPORT==== 10-Apr-2013::21:02:00 ===
application: egs_patch
exited: {bad_return,
{{egs_patch_app,start,[normal,[]]},
{'EXIT',
{undef,
[{cowboy,start_listener,
[{patch,11030},
10,cowboy_tcp_transport,
[{port,11030}],
egs_patch_protocol,[]],
[]},
{egs_patch_app,start_listeners,1,
[{file,"src/egs_patch_app.erl"},{line,44}]},
{egs_patch_app,start,2,
[{file,"src/egs_patch_app.erl"},{line,31}]},
{application_master,start_it_old,4,
[{file,"application_master.erl"},
{line,274}]}]}}}}
type: temporary
=CRASH REPORT==== 10-Apr-2013::21:02:00 ===
crasher:
initial call: application_master:init/4
pid: <0.75.0>
registered_name: []
exception exit: {bad_return,
{{egs_app,start,[normal,[]]},
{'EXIT',
{undef,
[{cowboy,start_listener,
[{login,12030},
10,cowboy_ssl_transport,
[{port,12030},
{certfile,"priv/ssl/servercert.pem"},
{keyfile,"priv/ssl/serverkey.pem"},
{password,"alpha"}],
egs_login_protocol,[]],
[]},
{egs_app,start_login_listeners,1,
[{file,"src/egs_app.erl"},{line,55}]},
{egs_app,start,2,
[{file,"src/egs_app.erl"},{line,38}]},
{application_master,start_it_old,4,
[{file,"application_master.erl"},
{line,274}]}]}}}}
in function application_master:init/4 (application_master.erl, line 138)
ancestors: [<0.74.0>]
messages: [{'EXIT',<0.76.0>,normal}]
links: [<0.74.0>,<0.7.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 987
stack_size: 27
reductions: 185
neighbours:
=INFO REPORT==== 10-Apr-2013::21:02:00 ===
application: egs
exited: {bad_return,
{{egs_app,start,[normal,[]]},
{'EXIT',
{undef,
[{cowboy,start_listener,
[{login,12030},
10,cowboy_ssl_transport,
[{port,12030},
{certfile,"priv/ssl/servercert.pem"},
{keyfile,"priv/ssl/serverkey.pem"},
{password,"alpha"}],
egs_login_protocol,[]],
[]},
{egs_app,start_login_listeners,1,
[{file,"src/egs_app.erl"},{line,55}]},
{egs_app,start,2,
[{file,"src/egs_app.erl"},{line,38}]},
{application_master,start_it_old,4,
[{file,"application_master.erl"},
{line,274}]}]}}}}
type: temporary
Here are the files from which the errors originate.
egs_patch_app.erl
-module(egs_patch_app).
-behaviour(application).
-export([start/2, stop/1]). %% API.
-type application_start_type()
:: normal | {takeover, node()} | {failover, node()}.
%% API.
-spec start(application_start_type(), term()) -> {ok, pid()}.
start(_Type, _StartArgs) ->
{ok, PatchPorts} = application:get_env(patch_ports),
start_listeners(PatchPorts),
egs_patch_sup:start_link().
-spec stop(term()) -> ok.
stop(_State) ->
ok.
%% Internal.
-spec start_listeners([inet:ip_port()]) -> ok.
start_listeners([]) ->
ok;
start_listeners([Port|Tail]) ->
{ok, _Pid} = cowboy:start_listener({patch, Port}, 10,
cowboy_tcp_transport, [{port, Port}],
egs_patch_protocol, []),
start_listeners(Tail).
egs_app.erl
-module(egs_app).
-behaviour(application).
-export([start/2, stop/1]). %% API.
-include("/home/mattk/Desktop/egs-master/apps/egs/include/records.hrl").
-type application_start_type()
:: normal | {takeover, node()} | {failover, node()}.
-define(SSL_OPTIONS, [{certfile, "priv/ssl/servercert.pem"},
{keyfile, "priv/ssl/serverkey.pem"}, {password, "alpha"}]).
%% API.
-spec start(application_start_type(), term()) -> {ok, pid()}.
start(_Type, _StartArgs) ->
{ok, Pid} = egs_sup:start_link(),
application:set_env(egs_patch, patch_ports, egs_conf:read(patch_ports)),
application:start(egs_patch),
start_login_listeners(egs_conf:read(login_ports)),
{_ServerIP, GamePort} = egs_conf:read(game_server),
{ok, _GamePid} = cowboy:start_listener({game, GamePort}, 10,
cowboy_ssl_transport, [{port, GamePort}] ++ ?SSL_OPTIONS,
egs_game_protocol, []),
{ok, Pid}.
-spec stop(term()) -> ok.
stop(_State) ->
ok.
%% Internal.
-spec start_login_listeners([inet:ip_port()]) -> ok.
start_login_listeners([]) ->
ok;
start_login_listeners([Port|Tail]) ->
{ok, _Pid} = cowboy:start_listener({login, Port}, 10,
cowboy_ssl_transport, [{port, Port}] ++ ?SSL_OPTIONS,
egs_login_protocol, []),
start_login_listeners(Tail).
Here's our hint:
.....
{{egs_patch_app,start,[normal,[]]},
{'EXIT',
{undef,
[{cowboy,start_listener, .....
The tuple {egs_patch_app,start,[normal,[]]} tells us that the error occurred in egs_patch_app:start/2. The atom EXIT is the tag of a notification message sent when a process has exited, or the result of an expression like catch error(someerror). Now we get to the interesting part. undef means an attempt was made to call an undefined function. A function is undefined if its Name/Arity doesn't match any known function. In this case, the undefined function is cowboy:start_listener().
Once again, the problem is that Cowboy has evolved while egs has not. Major changes in the Cowboy API have made the two incompatible. Since the last change in egs was about a year ago (assuming you're using essen's branch), you could try reverting to an older Cowboy tag by changing the corresponding rebar.config line to something like this:
{cowboy, ".*", {git, "git://github.com/extend/cowboy.git", {tag, "0.6.0"}}
Notice how "HEAD" changed to {tag, "0.6.0"}. The Cowboy reference may have to be changed in several applications (at least egs and egs_patch). You'll quite possibly need to clear your deps/ first.
Erlang error messages can be difficult to parse, but as a general rule of thumb, you should be on the lookout for a few atoms:
case_clause, meaning no clause in a case expression matched.
function_clause, meaning no function clause matched the arguments.
undef, as noted above, meaning a call to an external (not local to module) function couldn't be resolved.
badarg, which is Erlang's "illegal argument" exception.
badarith, a sneaky bastard that sometimes shows up when you mistype a variable name as an atom in an arithmetic expression, such as 1/x instead of 1/X.
To learn more about Erlang's error handling mechanisms, read the docs.
I have a run-time error in the init part of a gen_server.
- Init begin by process_flag(trap_exit,true)
- gen_server is part of a supervision tree
I try to print the reason in the terminate module but it seems to exit elsewhere.
- why terminate is not called ?
The application stops with shutdown as reason.
- How and where to catch the run-time error ?
The terminate callback is normally called in this situation, namely because you have trapped exits.
The only place where this is not the case is if the crash happens in the init-function. In that case, the responsibility is on the supervisor, who usually terminates itself as a result. Then this error crawls up the supervisor tree until it ends up terminating your whole application.
Usually, the supervisor will log a supervisor report with the context set to start_error. This is your hint that the part of the supervision tree has problems you should handle. You should check for this, because you may have the wrong assumption on where the error occurs.
EDITED FROM HERE
Your problem is that you don't know about SASL at all. Study it. Here is an example of how to use it.
Hoisted code from your example:
First, the bahlonga needed to tell Erlang we have a gen_server.
-module(foo).
-behaviour(gen_server).
-export([start_link/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
terminate/2, code_change/3]).
We hack the #state{} record so it can be used with your code
-record(state, { name, port, socket_listen }).
Basic start_linkage...
start_link() ->
gen_server:start_link({local, foo}, ?MODULE, [], []).
Your init function, spawn problem included.
init([]) ->
Port = 3252,
Name = "foo",
Above we have hacked a bit for the sake of simplification...
process_flag(trap_exit, true),
erlang:error(blabla),
Opts = [binary, {reuseaddr, true},
{backlog,5}, {packet, 0}, {active, false}, {nodelay, true}],
case gen_tcp:listen(Port,Opts) of
{ok,Socket_Listen} ->
logger:fmsg("--> [~s,init] Socket_Listen crée = ~p",
[Name,Socket_Listen]),
{ok,handle_accept(#state{socket_listen=Socket_Listen})};
{error, Reason} ->
logger:fmsg("--> [~s,init] Erreur, Raison =~p",
[Name,Reason]), {stop, Reason}
end.
Hacks for missing functions....
handle_accept(_) ->
#state{}.
The rest is just the basics..., so I omit them.
Now for foo_sup the supervisor for foo:
-module(foo_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
-define(SERVER, ?MODULE).
Basic start link...
start_link() ->
supervisor:start_link({local, ?SERVER}, ?MODULE, []).
Basic ChildSpec. Get the foo child up and running...
init([]) ->
FooCh = {foo, {foo, start_link, []},
permanent, 2000, worker, [foo]},
{ok, {{one_for_all,0,1}, [FooCh]}}.
Boot Erlang with SASL enabled:
jlouis#illithid:~$ erl -boot start_sasl
Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]
=PROGRESS REPORT==== 9-Dec-2010::01:01:51 ===
[..]
Eshell V5.8.3 (abort with ^G)
Let us try to spawn the supervisor...
1> foo_sup:start_link().
And we get this:
=CRASH REPORT==== 9-Dec-2010::01:05:48 ===
crasher:
initial call: foo:init/1
pid: <0.58.0>
registered_name: []
exception exit: {blabla,[{foo,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
Above we see that we have a crash in foo:init/1 due to an exception blabla.
in function gen_server:init_it/6
ancestors: [foo_sup,<0.45.0>]
messages: []
links: [<0.57.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 108
neighbours:
And now the supervisor gets to report about the problem!
=SUPERVISOR REPORT==== 9-Dec-2010::01:05:48 ===
Supervisor: {local,foo_sup}
Context: start_error
The context is exactly as I said it would be...
Reason: {blabla,[{foo,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
And with the expected reason.
Offender: [{pid,undefined},
{name,foo},
{mfargs,{foo,start_link,[]}},
{restart_type,permanent},
{shutdown,2000},
{child_type,worker}]
I'm working through the Erlang documentation, trying to understand the basics of setting up an OTP gen_server and supervisor. Whenever my gen_server crashes, my supervisor crashes as well. In fact, whenever I have an error on the command line, my supervisor crashes.
I expect the gen_server to be restarted when it crashes. I expect command line errors to have no bearing whatsoever on my server components. My supervisor shouldn't be crashing at all.
The code I'm working with is a basic "echo server" that replies with whatever you send in, and a supervisor that will restart the echo_server 5 times per minute at most (one_for_one). My code:
echo_server.erl
-module(echo_server).
-behaviour(gen_server).
-export([start_link/0]).
-export([echo/1, crash/0]).
-export([init/1, handle_call/3, handle_cast/2]).
start_link() ->
gen_server:start_link({local, echo_server}, echo_server, [], []).
%% public api
echo(Text) ->
gen_server:call(echo_server, {echo, Text}).
crash() ->
gen_server:call(echo_server, crash)..
%% behaviours
init(_Args) ->
{ok, none}.
handle_call(crash, _From, State) ->
X=1,
{reply, X=2, State}.
handle_call({echo, Text}, _From, State) ->
{reply, Text, State}.
handle_cast(_, State) ->
{noreply, State}.
echo_sup.erl
-module(echo_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(echo_sup, []).
init(_Args) ->
{ok, {{one_for_one, 5, 60},
[{echo_server, {echo_server, start_link, []},
permanent, brutal_kill, worker, [echo_server]}]}}.
Compiled using erlc *.erl, and here's a sample run:
Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-p
oll:false]
Eshell V5.7.2 (abort with ^G)
1> echo_sup:start_link().
{ok,<0.37.0>}
2> echo_server:echo("hi").
"hi"
3> echo_server:crash().
=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server echo_server terminating
** Last message in was crash
** When Server state == none
** Reason for termination ==
** {'function not exported',
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}
=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server <0.37.0> terminating
** Last message in was {'EXIT',<0.35.0>,
{{{undef,
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}},
[{gen_server,call,2},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}}
** When Server state == {state,
{<0.37.0>,echo_sup},
one_for_one,
[{child,<0.41.0>,echo_server,
{echo_server,start_link,[]},
permanent,brutal_kill,worker,
[echo_server]}],
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]}}},
5,60,
[{1273,79154,701110}],
echo_sup,[]}
** Reason for termination ==
** {{{undef,[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}},
[{gen_server,call,2},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}
** exception exit: {{undef,
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}}
in function gen_server:call/2
4> echo_server:echo("hi").
** exception exit: {noproc,{gen_server,call,[echo_server,{echo,"hi"}]}}
in function gen_server:call/2
5>
The problem testing supervisors from the shell is that the supervisor process is linked to the shell process. When gen_server process crashes the exit signal is propagated up to the shell which crashes and get restarted.
To avoid the problem add something like this to the supervisor:
start_in_shell_for_testing() ->
{ok, Pid} = supervisor:start_link(echo_sup, []),
unlink(Pid).
I would suggest you to debug/trace your application to check what's going on. It's very helpful in understanding how things work in OTP.
In your case, you might want to do the following.
Start the tracer:
dbg:tracer().
Trace all function calls for your supervisor and your gen_server:
dbg:p(all,c).
dbg:tpl(echo_server, x).
dbg:tpl(echo_sup, x).
Check which messages the processes are passing:
dbg:p(new, m).
See what's happening to your processes (crash, etc):
dbg:p(new, p).
For more information about tracing:
http://www.erlang.org/doc/man/dbg.html
http://aloiroberto.wordpress.com/2009/02/23/tracing-erlang-functions/
Hope this can help for this and future situations.
HINT: The gen_server behaviour is expecting the callback terminate/2 to be defined and exported ;)
UPDATE: After the definition of the terminate/2 the reason of the crash is evident from the trace. This is how it looks:
We (75) call the crash/0 function. This is received by the gen_server (78).
(<0.75.0>) call echo_server:crash()
(<0.75.0>) <0.78.0> ! {'$gen_call',{<0.75.0>,#Ref<0.0.0.358>},crash}
(<0.78.0>) << {'$gen_call',{<0.75.0>,#Ref<0.0.0.358>},crash}
(<0.78.0>) call echo_server:handle_call(crash,{<0.75.0>,#Ref<0.0.0.358>},none)
Uh, problem on the handle call. We have a badmatch...
(<0.78.0>) exception_from {echo_server,handle_call,3} {error,{badmatch,2}}
The terminate function is called. The server exits and it gets unregistered.
(<0.78.0>) call echo_server:terminate({{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},none)
(<0.78.0>) returned from echo_server:terminate/2 -> ok
(<0.78.0>) exit {{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
(<0.78.0>) unregister echo_server
The Supervisor (77) receive the exit signal from the gen_server and it does its job:
(<0.77.0>) << {'EXIT',<0.78.0>,
{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}}
(<0.77.0>) getting_unlinked <0.78.0>
(<0.75.0>) << {'DOWN',#Ref<0.0.0.358>,process,<0.78.0>,
{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}}
(<0.77.0>) call echo_server:start_link()
Well, it tries... Since it happens what Filippo said...
On the other hand, if at all restart-strategy has to be tested from within console, use console to start the supervisor and check with pman to kill the process.
You would see that pman refreshes with same supervisor Pid but with different worker Pids depending upon the MaxR and MaxT you have set in restart-strategy.