Cowboy on Erlang crashes on shutdown - erlang

I'm getting a lot of errors on shutdown of my Erlang vm related to my cowboy handlers. I've got a simple_one_for_one supervisor running a start_listeners() function that runs cowboy:start_http().
Everything starts, no errors, handles requests normally.
If I shutdown the erlang VM, I get:
[error] Supervisor bitter_rpc_sup had child bitter_rpc_http_id started with bitter_rpc_sup:start_listeners() at undefined exit with reason killed in context shutdown_error
And a bunch of other errors related to the cowboy processes being killed and terminating abnormally. Does cowboy not follow OTP conventions for shutdown? Is there a way for me to intercept the shutdown at the supervisor and manually shut down all of the cowboy processes / ranch pool?
Where should I be looking to try and squash this error?

You can create ranch child and add it in your supervisor:
init([]) ->
%% define Ref, NbAcceptors, IP, Port, Dispatch
...
WebChild = ranch:child_spec(Ref,
NbAcceptors,
ranch_tcp,
[{ip, IP}, {port, Port}],
cowboy_protocol,
[{env, [{dispatch, Dispatch}]}]),
{ok, {{one_for_one, 10, 10}, [WebChild]}}.

Taking a hard look at the included Cowboy examples, the http server isn't supervised directly, but is running under the Cowboy application.
So I changed the supervisor for my rpc daemon to do nothing:
init([]) ->
Procs = [],
{ok, {{one_for_one, 10, 10}, Procs}}.
and instantiated the cowboy dispatcher in the main process, returning the empty supervisor from start(,)

Related

Erlang: Cannot start supervisor on another node

I have a simple supervisor that looks like this
-module(a_sup).
-behaviour(supervisor).
%% API
-export([start_link/0, init/1]).
start_link() ->
supervisor:start_link({local,?MODULE}, ?MODULE, []).
init(_Args) ->
RestartStrategy = {simple_one_for_one, 5, 3600},
ChildSpec = {
a_gen_server,
{a_gen_server, start_link, []},
permanent,
brutal_kill,
worker,
[a_gen_server]
},
{ok, {RestartStrategy,[ChildSpec]}}.
When I run this on the shell, it works perfectly fine. But now I want to run different instances of this supervisor on different nodes, called foo and bar (started as erl -sname foo and erl -sname bar, from a separate node called main erl -sname main). This is how I try to initiate this rpc:call('foo#My-MacBook-Pro', a_sup, start_link, [])., but after replying with ok it immediately fails with this message
{ok,<9098.117.0>}
=ERROR REPORT==== 7-Mar-2022::16:05:45.416820 ===
** Generic server a_sup terminating
** Last message in was {'EXIT',<9098.116.0>,
{#Ref<0.3172713737.1597505552.87599>,return,
{ok,<9098.117.0>}}}
** When Server state == {state,
{local,a_sup},
simple_one_for_one,
{[a_gen_server],
#{a_gen_server =>
{child,undefined,a_gen_server,
{a_gen_server,start_link,[]},
permanent,false,brutal_kill,worker,
[a_gen_server]}}},
{maps,#{}},
5,3600,[],0,never,a_sup,[]}
** Reason for termination ==
** {#Ref<0.3172713737.1597505552.87599>,return,{ok,<9098.117.0>}}
(main#Prachis-MacBook-Pro)2> =CRASH REPORT==== 7-Mar-2022::16:05:45.416861 ===
crasher:
initial call: supervisor:a_sup/1
pid: <9098.117.0>
registered_name: a_sup
exception exit: {#Ref<0.3172713737.1597505552.87599>,return,
{ok,<9098.117.0>}}
in function gen_server:decode_msg/9 (gen_server.erl, line 481)
ancestors: [<9098.116.0>]
message_queue_len: 0
messages: []
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 610
stack_size: 29
reductions: 425
neighbours:
From the message it looks like the call expects the supervisor to be a gen_server instead? And when I try to initiat a gen_server on the node like this, it works out just fine, but not with supervisors. I can't seem to figure out if there's something different in trying to initiate supervisor on local/remote nodes, and if yes, what should we do to fix the issue?
As per #JoséM's suggestion, the supervisor in the remote node is also linked to the ephemeral RPC process. However since supervisor does not provide a start method, modifying the start_link() method as
start_link() ->
Pid = supervisor:start_link({local,?MODULE}, ?MODULE, []).
unlink(Pid),
{ok, Pid}.
solves the issue.

where start the poolboy privately?erlang database connection pools

英语不好,请见谅!!!!
I use the poolboy as my database connection pools,i have read the README.md on the github:https://github.com/devinus/poolboy
But at last i do not konw where i have started the poolboy when i want it to start,then i got an error:already_started
My project's files:http://pastebin.com/zus6dGdz
I use the cowboy to be my http server,but you can ignore it.
I start the program like this:
1.I use the rebar to compile
$rebar clean & make
2.then i use the erl to run my program
$ erl -pa ebin/ -pa deps/*/ebin -s start server_start
But i got the errors as follows:
=CRASH REPORT==== 3-Feb-2015::17:47:27 ===
crasher:
initial call: poolboy:init/1
pid: <0.171.0>
registered_name: []
exception exit: {{badmatch,{error,{already_started,<0.173.0>}}},
[{poolboy,new_worker,1,
[{file,"src/poolboy.erl"},{line,260}]},
{poolboy,prepopulate,3,
[{file,"src/poolboy.erl"},{line,281}]},
{poolboy,init,3,[{file,"src/poolboy.erl"},{line,143}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,237}]}]}
in function gen_server:init_it/6 (gen_server.erl, line 330)
ancestors: [hello_erlang_sup,<0.66.0>]
messages: []
links: [<0.172.0>,<0.173.0>,<0.170.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 610
stack_size: 27
reductions: 205
neighbours:
neighbour: [{pid,<0.173.0>},
{registered_name,db_mongo_handler},
{initial_call,{db_mongo_handler,init,['Argument__1']}},
{current_function,{gen_server,loop,6}},
{ancestors,[<0.172.0>,mg_pool1,hello_erlang_sup,<0.66.0>]},
{messages,[]},
{links,[<0.172.0>,<0.174.0>,<0.171.0>]},
{dictionary,[]},
{trap_exit,false},
{status,waiting},
{heap_size,233},
{stack_size,9},
{reductions,86}]
Please help to solve the problem!Ths!
You are starting a pool of 10 workers with the same registered name. When a process is registered with a name and another process tries to register with the same name, you get the error already_started.
In your example code, the worker module for poolboy is db_mongo_handler. Poolboy tries to start 10 workers by calling db_mongo_handler:start_link/1 which is implemented as
start_link(Args) ->
gen_server:start_link({local, ?SERVER}, ?MODULE, Args, []).
The first worker can start but when the second worker starts it crashes with already_started.
Normally the workers of a pool of many similar workers should not have a registered name. Instead, only the pool has a name and when you need a worker, you ask poolboy to deliver a pid() of one of the workers using poolboy:checkout(mg_pool1).
To fix the code, change gen_server:start_link({local, ?SERVER}, ?MODULE, Args, []) to gen_server:start_link(?MODULE, Args, []). Then it will not be registered with a name.

How YAWS handle concurrent users

I wish to know which code is being executed in YAWS every time a new client uses its web server...
First I tried to understand how YAWS handles concurrent users... and trie the following .yaws page:
io:format("~nProcess Identifier: ~p Port: ~p Client: ~p YAWS pid: ~p ~n",[self(), A#arg.clisock, A#arg.client_ip_port, A#arg.pid]).
which should return the Pid , port and ip of each client... I opened this page on the same browser (Firefox) and opened two distinct tabs... this was printed:
Process Identifier: <0.65.0> Port: #Port<0.1211> Client: {{127,0,0,1},60451} YAWS pid: <0.65.0>
Process Identifier: <0.65.0> Port: #Port<0.1211> Client: {{127,0,0,1},60451} YAWS pid: <0.65.0>
for some reason, the same port and pid are being returned (hence, YAWS isn't creating a new port or new pid for each client).
When I try this out on Chrome this was printed:
Process Identifier: <0.71.0> Port: #Port<0.2998> Client: {{127,0,0,1},60543} YAWS pid: <0.71.0>
Process Identifier: <0.71.0> Port: #Port<0.2998> Client: {{127,0,0,1},60543} YAWS pid: <0.71.0>
Hence, why is YAWS not opening a new port or pid for each tab on the same browser?
Also, back to the original question, where and which code does YAWS spawns a new PID or opens a new port?
Thanks
Unless you're sure that your browsers open new HTTP connections for each tab, you're not really testing what you think you're testing. Instead, try this from a command line:
curl http://yaws_host:yaws_port/path/to/your/yaws/page.yaws
curl http://yaws_host:yaws_port/path/to/your/yaws/page.yaws
Yes, run it twice, as that is guaranteed to use two separate connections. You will then see that Yaws uses two distinct Erlang processes and TCP connections to handle the two requests:
Process Identifier: <0.59.0> Port: #Port<0.1181> Client: {{127,0,0,1},64977} YAWS pid: <0.59.0>
Process Identifier: <0.64.0> Port: #Port<0.3268> Client: {{127,0,0,1},64978} YAWS pid: <0.64.0>
As for where the Yaws code for dealing with connections resides, you can look in yaws_server.erl, in particular at the acceptor/1 function which launches processes to accept connections and the do_listen/2 function which opens sockets for listening.

gen_server and the run-time errors

I have a run-time error in the init part of a gen_server.
- Init begin by process_flag(trap_exit,true)
- gen_server is part of a supervision tree
I try to print the reason in the terminate module but it seems to exit elsewhere.
- why terminate is not called ?
The application stops with shutdown as reason.
- How and where to catch the run-time error ?
The terminate callback is normally called in this situation, namely because you have trapped exits.
The only place where this is not the case is if the crash happens in the init-function. In that case, the responsibility is on the supervisor, who usually terminates itself as a result. Then this error crawls up the supervisor tree until it ends up terminating your whole application.
Usually, the supervisor will log a supervisor report with the context set to start_error. This is your hint that the part of the supervision tree has problems you should handle. You should check for this, because you may have the wrong assumption on where the error occurs.
EDITED FROM HERE
Your problem is that you don't know about SASL at all. Study it. Here is an example of how to use it.
Hoisted code from your example:
First, the bahlonga needed to tell Erlang we have a gen_server.
-module(foo).
-behaviour(gen_server).
-export([start_link/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
terminate/2, code_change/3]).
We hack the #state{} record so it can be used with your code
-record(state, { name, port, socket_listen }).
Basic start_linkage...
start_link() ->
gen_server:start_link({local, foo}, ?MODULE, [], []).
Your init function, spawn problem included.
init([]) ->
Port = 3252,
Name = "foo",
Above we have hacked a bit for the sake of simplification...
process_flag(trap_exit, true),
erlang:error(blabla),
Opts = [binary, {reuseaddr, true},
{backlog,5}, {packet, 0}, {active, false}, {nodelay, true}],
case gen_tcp:listen(Port,Opts) of
{ok,Socket_Listen} ->
logger:fmsg("--> [~s,init] Socket_Listen crée = ~p",
[Name,Socket_Listen]),
{ok,handle_accept(#state{socket_listen=Socket_Listen})};
{error, Reason} ->
logger:fmsg("--> [~s,init] Erreur, Raison =~p",
[Name,Reason]), {stop, Reason}
end.
Hacks for missing functions....
handle_accept(_) ->
#state{}.
The rest is just the basics..., so I omit them.
Now for foo_sup the supervisor for foo:
-module(foo_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
-define(SERVER, ?MODULE).
Basic start link...
start_link() ->
supervisor:start_link({local, ?SERVER}, ?MODULE, []).
Basic ChildSpec. Get the foo child up and running...
init([]) ->
FooCh = {foo, {foo, start_link, []},
permanent, 2000, worker, [foo]},
{ok, {{one_for_all,0,1}, [FooCh]}}.
Boot Erlang with SASL enabled:
jlouis#illithid:~$ erl -boot start_sasl
Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]
=PROGRESS REPORT==== 9-Dec-2010::01:01:51 ===
[..]
Eshell V5.8.3 (abort with ^G)
Let us try to spawn the supervisor...
1> foo_sup:start_link().
And we get this:
=CRASH REPORT==== 9-Dec-2010::01:05:48 ===
crasher:
initial call: foo:init/1
pid: <0.58.0>
registered_name: []
exception exit: {blabla,[{foo,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
Above we see that we have a crash in foo:init/1 due to an exception blabla.
in function gen_server:init_it/6
ancestors: [foo_sup,<0.45.0>]
messages: []
links: [<0.57.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 108
neighbours:
And now the supervisor gets to report about the problem!
=SUPERVISOR REPORT==== 9-Dec-2010::01:05:48 ===
Supervisor: {local,foo_sup}
Context: start_error
The context is exactly as I said it would be...
Reason: {blabla,[{foo,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
And with the expected reason.
Offender: [{pid,undefined},
{name,foo},
{mfargs,{foo,start_link,[]}},
{restart_type,permanent},
{shutdown,2000},
{child_type,worker}]

erlang OTP Supervisor crashing

I'm working through the Erlang documentation, trying to understand the basics of setting up an OTP gen_server and supervisor. Whenever my gen_server crashes, my supervisor crashes as well. In fact, whenever I have an error on the command line, my supervisor crashes.
I expect the gen_server to be restarted when it crashes. I expect command line errors to have no bearing whatsoever on my server components. My supervisor shouldn't be crashing at all.
The code I'm working with is a basic "echo server" that replies with whatever you send in, and a supervisor that will restart the echo_server 5 times per minute at most (one_for_one). My code:
echo_server.erl
-module(echo_server).
-behaviour(gen_server).
-export([start_link/0]).
-export([echo/1, crash/0]).
-export([init/1, handle_call/3, handle_cast/2]).
start_link() ->
gen_server:start_link({local, echo_server}, echo_server, [], []).
%% public api
echo(Text) ->
gen_server:call(echo_server, {echo, Text}).
crash() ->
gen_server:call(echo_server, crash)..
%% behaviours
init(_Args) ->
{ok, none}.
handle_call(crash, _From, State) ->
X=1,
{reply, X=2, State}.
handle_call({echo, Text}, _From, State) ->
{reply, Text, State}.
handle_cast(_, State) ->
{noreply, State}.
echo_sup.erl
-module(echo_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(echo_sup, []).
init(_Args) ->
{ok, {{one_for_one, 5, 60},
[{echo_server, {echo_server, start_link, []},
permanent, brutal_kill, worker, [echo_server]}]}}.
Compiled using erlc *.erl, and here's a sample run:
Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-p
oll:false]
Eshell V5.7.2 (abort with ^G)
1> echo_sup:start_link().
{ok,<0.37.0>}
2> echo_server:echo("hi").
"hi"
3> echo_server:crash().
=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server echo_server terminating
** Last message in was crash
** When Server state == none
** Reason for termination ==
** {'function not exported',
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}
=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server <0.37.0> terminating
** Last message in was {'EXIT',<0.35.0>,
{{{undef,
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}},
[{gen_server,call,2},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}}
** When Server state == {state,
{<0.37.0>,echo_sup},
one_for_one,
[{child,<0.41.0>,echo_server,
{echo_server,start_link,[]},
permanent,brutal_kill,worker,
[echo_server]}],
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]}}},
5,60,
[{1273,79154,701110}],
echo_sup,[]}
** Reason for termination ==
** {{{undef,[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}},
[{gen_server,call,2},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}
** exception exit: {{undef,
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}}
in function gen_server:call/2
4> echo_server:echo("hi").
** exception exit: {noproc,{gen_server,call,[echo_server,{echo,"hi"}]}}
in function gen_server:call/2
5>
The problem testing supervisors from the shell is that the supervisor process is linked to the shell process. When gen_server process crashes the exit signal is propagated up to the shell which crashes and get restarted.
To avoid the problem add something like this to the supervisor:
start_in_shell_for_testing() ->
{ok, Pid} = supervisor:start_link(echo_sup, []),
unlink(Pid).
I would suggest you to debug/trace your application to check what's going on. It's very helpful in understanding how things work in OTP.
In your case, you might want to do the following.
Start the tracer:
dbg:tracer().
Trace all function calls for your supervisor and your gen_server:
dbg:p(all,c).
dbg:tpl(echo_server, x).
dbg:tpl(echo_sup, x).
Check which messages the processes are passing:
dbg:p(new, m).
See what's happening to your processes (crash, etc):
dbg:p(new, p).
For more information about tracing:
http://www.erlang.org/doc/man/dbg.html
http://aloiroberto.wordpress.com/2009/02/23/tracing-erlang-functions/
Hope this can help for this and future situations.
HINT: The gen_server behaviour is expecting the callback terminate/2 to be defined and exported ;)
UPDATE: After the definition of the terminate/2 the reason of the crash is evident from the trace. This is how it looks:
We (75) call the crash/0 function. This is received by the gen_server (78).
(<0.75.0>) call echo_server:crash()
(<0.75.0>) <0.78.0> ! {'$gen_call',{<0.75.0>,#Ref<0.0.0.358>},crash}
(<0.78.0>) << {'$gen_call',{<0.75.0>,#Ref<0.0.0.358>},crash}
(<0.78.0>) call echo_server:handle_call(crash,{<0.75.0>,#Ref<0.0.0.358>},none)
Uh, problem on the handle call. We have a badmatch...
(<0.78.0>) exception_from {echo_server,handle_call,3} {error,{badmatch,2}}
The terminate function is called. The server exits and it gets unregistered.
(<0.78.0>) call echo_server:terminate({{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},none)
(<0.78.0>) returned from echo_server:terminate/2 -> ok
(<0.78.0>) exit {{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
(<0.78.0>) unregister echo_server
The Supervisor (77) receive the exit signal from the gen_server and it does its job:
(<0.77.0>) << {'EXIT',<0.78.0>,
{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}}
(<0.77.0>) getting_unlinked <0.78.0>
(<0.75.0>) << {'DOWN',#Ref<0.0.0.358>,process,<0.78.0>,
{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}}
(<0.77.0>) call echo_server:start_link()
Well, it tries... Since it happens what Filippo said...
On the other hand, if at all restart-strategy has to be tested from within console, use console to start the supervisor and check with pman to kill the process.
You would see that pman refreshes with same supervisor Pid but with different worker Pids depending upon the MaxR and MaxT you have set in restart-strategy.

Resources