multiple nodedown messages in erlang cluster - erlang

I am building a simple gen_server module which monitors activity of multiple remote nodes
When a remote node registers, this module monitors the node with erlang:monitor_node(Node, true). This is registered only once per node (confirmed with logs)
and in a handle_info/2 callback of gen_server, it catches {nodedown, Node} message and demonitors the node with erlang:monitor_node(Node, false). I expect to receive this message only once: when the remote node is down.
When I was testing the module, I found that when a remote node goes down, hundreds of {nodedown, Node} messages (the number varies from few hundreds to few thousands) are sent to the gen_server.
Why are multiple messages sent by monitor_node? How can I prevent this behaviour?
EDIT: here is (a part of) the source code
register_node(#node_info{node = NodeName} = NodeInfo) ->
case mnesia:read(node_info, NodeName) of
[] ->
monitor_node(NodeName, true),
error_logger:info_msg("node ~p registered", [NodeName]);
[_OldInfo] ->
error_logger:trace_msg("info of node ~p updated", [NodeName])
end,
mnesia:write(NodeInfo).
handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:transaction(fun register_node/1, [NodeStatus]) of
{aborted, Reason} ->
error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
handle_cast({shutdown_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:dirty_delete_object(NodeStatus) of
{aborted, Reason} ->
error_logger:warning_msg("transaction shutdown_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
handle_cast(Message, Timer) ->
error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
{noreply, Timer}.
handle_info({nodedown, Node}, Timer) ->
monitor_node(Node, false),
error_logger:info_msg("~p: node ~p down", [?MODULE, Node]),
mnesia:transaction(fun mnesia:delete/3, [node_info, Node, write]),
{noreply, Timer};
handle_info(Message, Timer) ->
error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
{noreply, Timer}.

You have done monitor_node(NodeName, true) **INSIDE** the mnesia transaction.
I think that because monitor_node will involve (I/O operation) message communication internally.
It is not suitable to put this line inside transation. It maybe send handreds of 'registered' message to the involved node. So that when the node became down, handreds of 'nodedown' messages have been received.
If a process has made two calls to monitor_node(Node, true) and Node terminates,
**two nodedown messages are delivered to the process.** If there is no connection
to Node, there will be an attempt to create one. If this fails, a nodedown
message is delivered.
Please move the line out of transaction or just use "CASE" expression, and try again.
register_node(#node_info{node = NodeName} = NodeInfo) ->
case mnesia:read(node_info, NodeName) of
[] ->
monitor_node(NodeName, true),
error_logger:info_msg("node ~p registered", [NodeName]);
[_OldInfo] ->
error_logger:trace_msg("info of node ~p updated", [NodeName])
end,
mnesia:write(NodeInfo).
handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:transaction(fun register_node/1, [NodeStatus]) of
{aborted, Reason} ->
error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
explanation of side-effect in mnesia transaction
Mnesia dynamically sets and releases locks as transactions execute,
therefore, it is very dangerous to execute code with transaction
side-effects. In particular, a receive statement inside a transaction
can lead to a situation where the transaction hangs and never returns,
which in turn can cause locks not to release. This situation could
bring the whole system to a standstill since other transactions which
execute in other processes, or on other nodes, are forced to wait for
the defective transaction.

Related

How to use efficiently receive clause in erlang gen_server to resolve timeout error?

Sometimes my loop returns ok because of timeout how to write this code in proper way. When there is a timeout it just returns ok but not my actual value that I am assuming. In handle call I am calling a function loop() in the loop() function i am receiving a message with receive clause. Now I am sending this data to my database using loop2 function returns response from database whether data has been successfully saved or not and giving response back to loop(). But if there is a timeout my loop function returns ok but not actual value.
% #Author: ZEESHAN AHMAD
% #Date: 2020-12-22 05:06:12
% #Last Modified by: ZEESHAN AHMAD
% #Last Modified time: 2021-01-10 04:42:59
-module(getAccDataCons).
-behaviour(gen_server).
-include_lib("deps/amqp_client/include/amqp_client.hrl").
-export([start_link/0, stop/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2, code_change/3,
terminate/2]).
-export([get_account/0]).
start_link() ->
gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
stop() ->
gen_server:cast(?MODULE, stop).
get_account() ->
gen_server:call(?MODULE, {get_account}).
init(_Args) ->
{ok, Connection} = amqp_connection:start(#amqp_params_network{host = "localhost"}),
{ok, Channel} = amqp_connection:open_channel(Connection),
{ok, Channel}.
handle_call({get_account}, _From, State) ->
amqp_channel:call(State, #'exchange.declare'{exchange = <<"get">>, type = <<"topic">>}),
amqp_channel:call(State, #'queue.declare'{queue = <<"get_account">>}),
Binding =
#'queue.bind'{exchange = <<"get">>,
routing_key = <<"get.account">>,
queue = <<"get_account">>},
#'queue.bind_ok'{} = amqp_channel:call(State, Binding),
io:format(" [*] Waiting for logs. To exit press CTRL+C~n"),
amqp_channel:call(State,#'basic.consume'{queue = <<"get_account">>, no_ack = true}),
Returned =loop(),
io:format("~nReti=~p",[Returned]),
{reply, Returned, State};
handle_call(Message, _From, State) ->
io:format("received other handle_call message: ~p~n", [Message]),
{reply, ok, State}.
handle_cast(stop, State) ->
{stop, normal, State};
handle_cast(Message, State) ->
io:format("received other handle_cast call : ~p~n", [Message]),
{noreply, State}.
handle_info(Message, State) ->
io:format("received handle_info message : ~p~n", [Message]),
{noreply, State}.
code_change(_OldVer, State, _Extra) ->
{ok, State}.
terminate(Reason, _State) ->
io:format("server is terminating with reason :~p~n", [Reason]).
loop()->
receive
#'basic.consume_ok'{} -> ok
end,
receive
{#'basic.deliver'{}, Msg} ->
#amqp_msg{payload = Payload} = Msg,
Value=loop2(Payload),
Value
after 2000->
io:format("Server timeout")
end.
loop2(Payload)->
Result = jiffy:decode(Payload),
{[{<<"account_id">>, AccountId}]} = Result,
Doc = {[{<<"account_id">>, AccountId}]},
getAccDataDb:create_AccountId_view(),
Returned=case getAccDataDb:getAccountNameDetails(Doc) of
success ->
Respo = getAccDataDb:getAccountNameDetails1(Doc),
Respo;
details_not_matched ->
user_not_exist
end,
Returned.
This is too long for an edit, I put it in a new answer.
The reason why you receive ok when a timeout occurs is in the loop() code. In the second receive block, after 2000 ms, you return
immediately after the io:format/1 statement.
io:format returns ok and it is what you get in the Returned variable. You should change this code with
loop()->
ok = receive
#'basic.consume_ok'{} -> ok
end,
receive
{#'basic.deliver'{}, #amqp_msg{payload = Payload}} -> {ok,loop2(Payload)}
after 2000 ->
io:format("Server timeout"),
{error,timeout}
end.
With this code your client will receive either {ok,Value}, either {error,timeout} and will be able to react accordingly.
But there are still issues with this version:
- the 2 seconds timeout is maybe too short and you are missing valid answer
- as you are using pattern matching in the receive blocks and do not check the result of each amqp_channel:call there are many different problems that could occur and appear as a timeout
First lets have a look at the timeout. It is possible that the 4 calls to amqp_channel really need more than 2 seconds in total to complete successfully. The simple solution is to increase your timeout, changing after 2000 to after 3000 or more.
But then you will have 2 issues:
Your gen_server is blocked during all this time, and if it is not dedicated to a single client, it will be unavailable to
serve any other request while it is waiting for the answer.
If you need to increase the timeout above 5 second, you will hit another timeout, managed internally by the gen_server: a request must be answered in less than 5 seconds.
The gen_server offers some interface functions to solve this kind of problem: 'send_request', 'wait_response' and reply. Here is a basic
gen_server which can handle 3 kind of requests:
stop ... to stop the server, useful to update the code.
{blocking,Time,Value} the server will sleep during Time ms end then return Value. This simulates your case, and you can tweak how
long it takes to get an answer.
{non_blocking,Time,Value} the server will delegate the job to another process and return immediately without answer (therefore
it is available for another request). the new process will sleep during Time ms end then return Value using gen_server:reply.
The server module implements several user interfaces:
the standard start(), stop()
blocking(Time,Value) to call the server with the request {blocking,Time,Value} using gen_server:call
blocking_catch(Time,Value) same as the previous one, but catching the result of gen_server:call to show the hidden timeout
non_blocking(Time,Value,Wait) to call the server with the request {non_blocking,Time,Value} using gen_server:send_request and waiting for the answer for Wait ms maximum
Finally it includes 2 test functions
test([Type,Time,Value,OptionalWait]) it spawns a process which will send a reqest of type with the corresponding parameters. The answer is sent back to the calling process. The answer can be retreive with flush() in the shell.
parallel_test ([Type,Time,NbRequests,OptionalWait]) it calls NbRequests times test with the corresponding parameters. It collects all
the answers and print them using the local function collect(NbRequests,Timeout).
Code below
-module (server_test).
-behaviour(gen_server).
%% API
-export([start/0,stop/0,blocking/2,blocking_catch/2,non_blocking/3,test/1,parallel_test/1]).
%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
terminate/2, code_change/3]).
-define(SERVER, ?MODULE).
%%%===================================================================
%%% API
%%%===================================================================
start() ->
gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).
stop() ->
gen_server:cast(?SERVER, stop).
blocking(Time,Value) ->
gen_server:call(?SERVER, {blocking,Time,Value}).
blocking_catch(Time,Value) ->
catch {ok,gen_server:call(?SERVER, {blocking,Time,Value})}.
non_blocking(Time,Value,Wait) ->
ReqId = gen_server:send_request(?SERVER,{non_blocking,Time,Value}),
gen_server:wait_response(ReqId,Wait).
test([Type,Time,Value]) -> test([Type,Time,Value,5000]);
test([Type,Time,Value,Wait]) ->
Start = erlang:monotonic_time(),
From = self(),
F = fun() ->
R = case Type of
non_blocking -> ?MODULE:Type(Time,Value,Wait);
_ -> ?MODULE:Type(Time,Value)
end,
From ! {request,Type,Time,Value,got_answer,R,after_microsec,erlang:monotonic_time() - Start}
end,
spawn(F).
parallel_test([Type,Time,NbRequests]) -> parallel_test([Type,Time,NbRequests,5000]);
parallel_test([Type,Time,NbRequests,Wait]) ->
case Type of
non_blocking -> [server_test:test([Type,Time,X,Wait]) || X <- lists:seq(1,NbRequests)];
_ -> [server_test:test([Type,Time,X]) || X <- lists:seq(1,NbRequests)]
end,
collect_answers(NbRequests,Time + 1000).
%%%===================================================================
%%% gen_server callbacks
%%%===================================================================
init([]) ->
{ok, #{}}.
handle_call({blocking,Time,Value}, _From, State) ->
timer:sleep(Time),
Reply = {ok,Value},
{reply, Reply, State};
handle_call({non_blocking,Time,Value}, From, State) ->
F = fun() ->
do_answer(From,Time,Value)
end,
spawn(F),
{noreply, State};
handle_call(_Request, _From, State) ->
Reply = ok,
{reply, Reply, State}.
handle_cast(stop, State) ->
{stop,stopped, State};
handle_cast(_Msg, State) ->
{noreply, State}.
handle_info(_Info, State) ->
{noreply, State}.
terminate(_Reason, _State) ->
ok.
code_change(OldVsn, State, _Extra) ->
io:format("changing code replacing version ~p~n",[OldVsn]),
{ok, State}.
%%%===================================================================
%%% Internal functions
%%%===================================================================
do_answer(From,Time,Value) ->
timer:sleep(Time),
gen_server:reply(From, Value).
collect_answers(0,_Timeout) ->
got_all_answers;
collect_answers(NbRequests,Timeout) ->
receive
A -> io:format("~p~n",[A]),
collect_answers(NbRequests - 1, Timeout)
after Timeout ->
missing_answers
end.
Session in the shell:
44> c(server_test).
{ok,server_test}
45> server_test:start().
{ok,<0.338.0>}
46> server_test:parallel_test([blocking,200,3]).
{request,blocking,200,1,got_answer,{ok,1},after_microsec,207872}
{request,blocking,200,2,got_answer,{ok,2},after_microsec,415743}
{request,blocking,200,3,got_answer,{ok,3},after_microsec,623615}
got_all_answers
47> % 3 blocking requests in parallel, each lasting 200ms, they are executed in sequence but no timemout is reached
47> % All the clients get their answers
47> server_test:parallel_test([blocking,2000,3]).
{request,blocking,2000,1,got_answer,{ok,1},after_microsec,2063358}
{request,blocking,2000,2,got_answer,{ok,2},after_microsec,4127740}
missing_answers
48> % 3 blocking requests in parallel, each lasting 2000ms, they are executed in sequence and the last answer exceeds the gen_server timeout.
48> % The client for this request don't receive answer. The client should also manage its own timeout to handle this case
48> server_test:parallel_test([blocking_catch,2000,3]).
{request,blocking_catch,2000,1,got_answer,{ok,1},after_microsec,2063358}
{request,blocking_catch,2000,2,got_answer,{ok,2},after_microsec,4127740}
{request,blocking_catch,2000,3,got_answer,
{'EXIT',{timeout,{gen_server,call,[server_test,{blocking,2000,3}]}}},
after_microsec,5135355}
got_all_answers
49> % same thing but catching the exception. After 5 seconds the gen_server call throws a timeout exception.
49> % The information can be forwarded to the client
49> server_test:parallel_test([non_blocking,200,3]).
{request,non_blocking,200,1,got_answer,{reply,1},after_microsec,207872}
{request,non_blocking,200,2,got_answer,{reply,2},after_microsec,207872}
{request,non_blocking,200,3,got_answer,{reply,3},after_microsec,207872}
got_all_answers
50> % using non blocking mechanism, we can see that all the requests were managed in parallel
50> server_test:parallel_test([non_blocking,5100,3]).
{request,non_blocking,5100,1,got_answer,timeout,after_microsec,5136379}
{request,non_blocking,5100,2,got_answer,timeout,after_microsec,5136379}
{request,non_blocking,5100,3,got_answer,timeout,after_microsec,5136379}
got_all_answers
51> % if we increase the answer delay above 5000ms, all requests fail in default timeout
51> server_test:parallel_test([non_blocking,5100,3,6000]).
{request,non_blocking,5100,1,got_answer,{reply,1},after_microsec,5231611}
{request,non_blocking,5100,2,got_answer,{reply,2},after_microsec,5231611}
{request,non_blocking,5100,3,got_answer,{reply,3},after_microsec,5231611}
got_all_answers
52> % but thanks to the send_request/wait_response/reply interfaces, the client can adjust the timeout to an accurate value
52> % for each request
The next reason why the request could not complete is that one of the amqp_channel:call fails. Depending on what you want to do, there are several
possibilities from doing nothing, let crash, catch the exception or manage all cases. the next proposal uses a global catch
handle_call({get_account,Timeout}, From, State) ->
F = fun() ->
do_get_account(From,State,Timeout)
end,
spawn(F), % delegate the job to another process and free the server
{noreply, State}; % I don't see any change of State in your code, this should be enough
...
do_get_account(From,State,Timeout) ->
% this block of code asserts all positive return values from amqp_channel calls. it will catch any error
% and return it as {error,...}. If everything goes well it return {ok,Answer}
Reply = try
ok = amqp_channel:call(State, #'exchange.declare'{exchange = <<"get">>, type = <<"topic">>}),
ok = amqp_channel:call(State, #'queue.declare'{queue = <<"get_account">>}),
Binding = #'queue.bind'{exchange = <<"get">>,
routing_key = <<"get.account">>,
queue = <<"get_account">>},
#'queue.bind_ok'{} = amqp_channel:call(State, Binding),
ok = amqp_channel:call(State,#'basic.consume'{queue = <<"get_account">>, no_ack = true}),
{ok,wait_account_reply(Timeout)}
catch
Class:Exception -> {error,Class,Exception}
end,
gen_server:reply(From, Reply).
wait_account_reply(Timeout) ->
receive
% #'basic.consume_ok'{} -> ok % you do not handle this message, ignore it since it will be garbaged when the process die
{#'basic.deliver'{}, #amqp_msg{payload = Payload}} -> extract_account(Payload)
after Timeout->
server_timeout
end.
extract_account(Payload)->
{[{<<"account_id">>, AccountId}]} = jiffy:decode(Payload),
Doc = {[{<<"account_id">>, AccountId}]},
getAccDataDb:create_AccountId_view(), % What is the effect of this function, what is the return value?
case getAccDataDb:getAccountNameDetails(Doc) of
success ->
getAccDataDb:getAccountNameDetails1(Doc);
details_not_matched ->
user_not_exist
end.
And the client should looks like:
get_account() ->
ReqId = gen_server:send_request(server_name,{get_account,2000}),
gen_server:wait_response(ReqId,2200).
Without the loop and loop2 code, it is hard to give an answer, and if the timeout is detected by one of these 2 functions, you must first change their behavior to avoid any timeout, or increase it to a value that works. If a timeout is necessary, then ensure that the return value is explicit wet it occurs, for example {error,RequestRef,timeout} rather than ok.
Nevertheless the gen_server should not wait too long for an answer, you can modify your code doing:
Instead of using gen_server:call(ServerRef,Request) in the client process, you could use:
RequestId = send_request(ServerRef, Request),
Result = wait_response(RequestId, Timeout),
And remove the timeout in loop and/or loop2. Doing this you can control the timeout on the client side, you can even set it to infinity (not a good idea!).
Or you can split your function in two parts
gen_server:cast(ServerRef,{Request,RequestRef}),
% this will not wait for any answer, RequestRef is a tag to identify later
% if the request was fulfilled, you can use make_ref() to generate it
and later, or in another client process (this need to pass at least the RequestRef to this process) Check the result for request:
Answer = gen_server:call(ServerRef,{get_answer,RequestRef}),
case Answer of
no_reply -> ... % no answer yet
{ok,Reply} -> ... % handle the answer
end,
finally you must modify the loop code to handle the RequestRef, send back a message (using again gen_server:cast) to the server with the result and the RequestRef, and store this result in the server state.
I don't think this second solution is valuable since it is more or less the same than the first one, but hand made, and it let you to manage many error cases (such as client death) that could end into a kind of memory leak.

send message to gen_server before init

I am new to erlang, I having following code:
-module (test_srv).
-behaviour (gen_server).
-export ([start_link/0, init/1, handle_info/2]).
start_link() ->
gen_server:start_link(?MODULE, [], []).
init([]) ->
self() ! asdasd,
{ok, new_state}.
handle_info(Msg, State) ->
io:format("server got ~p , now state is ~p~n", [Msg, State]),
{noreply, State}.
I test in erl shell:
1> {_, P} = test_srv:start_link().
server got asdasd , now state is new_state
The problem is, When send a message to server when the server is not initialised and not readly, how dose gen_server handle the message? I have following guesses:
gen_server handle the message immediately, and send the message to handle_info callback, but will lose initialised state in init callback
gen_server store the message if not server initialised, and send message after server initialised.
I wanna to know how erlang or gen_server handle this problem? what is the principle of handle message?
I'm guessing by server is not initialised you mean the rest of the init function being executed. In that case your second guess is correct. It's guaranteed that the handle_info will be executed after init has returned. Since the gen_server is a single process, and you're already executing init, the messages sent to itself from init will only be processed by gen_server after init has finished executing.

How can I know when it's the last cycle of my process restarted by the supervisor in erlang

I have a simple_one_for_one supervisor which has gen_fsm children.
I want each gen_fsm child to send a message only on the last time it terminates.
Is there any way to know when is the last cycle?
here's my supervisor:
-module(data_sup).
-behaviour(supervisor).
%% API
-export([start_link/0,create_bot/3]).
%% Supervisor callbacks
-export([init/1]).
%%-compile(export_all).
%%%===================================================================
%%% API functions
%%%===================================================================
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
RestartStrategy = {simple_one_for_one, 0, 1},
ChildSpec = {cs_fsm, {cs_fsm, start_link, []},
permanent, 2000, worker, [cs_fsm]},
Children = [ChildSpec],
{ok, {RestartStrategy, Children}}.
create_bot(BotId, CNPJ,Pid) ->
supervisor:start_child(?MODULE, [BotId, CNPJ, Pid]).
the Pid is the Pid of the process which starts the superviser and gives orders to start the children.
-module(cs_fsm).
-behaviour(gen_fsm).
-compile(export_all).
-define(SERVER, ?MODULE).
-define(TIMEOUT, 5000).
-record(params, {botId, cnpj, executionId, pid}).
%%%===================================================================
%%% API
%%%===================================================================
start_link(BotId, CNPJ, Pid) ->
io:format("start_link...~n"),
Params = #params{botId = BotId, cnpj = CNPJ, pid = Pid},
gen_fsm:start_link(?MODULE, Params, []).
%%%===================================================================
%%% gen_fsm callbacks
%%%===================================================================
init(Params) ->
io:format("initializing~n"),
process_flag(trap_exit, true),
{ok, requesting_execution, Params, 0}.
requesting_execution(timeout,Params) ->
io:format("erqusting execution"),
{next_state, finished, Params,?TIMEOUT}.
finished(timeout, Params) ->
io:format("finished :)~n"),
{stop, normal, Params}.
terminate(shutdown, _StateName, Params) ->
Params#params.pid ! {terminated, self(),Params},
ok;
terminate(_Reason, _StateName, Params) ->
ok.
my point is that if the process fails in any of the states it should send a message only if it is the last time it is restarted by the supervisor (according to its restart strategy).
If the gen_fsm fails, does it restart from the same state with same state data? If not how can I cause it to happen?
You can add sending the message to the Module:terminate/3 function which is called when one of the StateName functions returns {stop,Reason,NewStateData} to indicate that the gen_fsm should be stopped.
gen_fsm is a finite state machine so you decide how it transitions between states. Something that triggers the last cycle may also set something in the StateData that is passed to Module:StateName/3 so that the function that handles the state knows it's the last cycle. It's hard to give a more specific answer unless you provide some code which we could analyze and comment on.
EDIT after further clarification:
Supervisor doesn't notify its children which time it has restarted them and it also can't notify the child that it's the last restart. This later is simply because it doesn't know that it's going to be the last until the supervisor process actually crashes once more, which the supervisor can't possibly predict. Only after the child crashed supervisor can calculate how many times the child crashed during a period of time and if it is allowed to restart the child once more or if that was the last restart and now it's time for the supervisor to die as well.
However, nothing is stopping the child from registering, e.g. in an ETS table, how many times it has been restarted. But it of course won't help with deducting which restart is the last one.
Edit 2:
When the supervisor restarts the child it starts it from scratch using the standard init function. Any previous state of the child before it crashed is lost.
Please note that a crash is an exceptional situation and it's not always possible to recover the state, because the crash could have corrupted the state. Instead of trying to recover the state or asking supervisor when it's done restarting the child, why not to prevent the crash from happening in the first place? You have two options:
I. Use try/catch to catch any exceptional situations and act accordingly. It's possible to catch any error that would otherwise crash the process and cause supervisor to restart it. You can add try/catch to any entry function inside the gen_fsm process so that any error condition is caught before it crashes the server. See example function 1 or example function 2:
read() ->
try
try_home() orelse try_path(?MAIN_CFG) orelse
begin io:format("Some Error", []) end
catch
throw:Term -> {error, Term}
end.
try_read(Path) ->
try
file:consult(Path)
catch
error:Error -> {error, Error}
end.
II. Spawn a new process to handle the job and trap EXIT signals when the process dies. This allows gen_fsm to handle a job asynchronously and handle any errors in a custom way (not necessarily by restarting the process as a supervisor would do). This section titled Error Handling explains how to trap exit signals from child processes. And this is an example of trapping signals in a gen_server. Check the handle_info function that contains a few clauses to trap different types of EXIT messages from children processes.
init([Cfg, Id, Mode]) ->
process_flag(trap_exit, true),
(...)
handle_info({'EXIT', _Pid, normal}, State) ->
{noreply, State};
handle_info({'EXIT', _Pid, noproc}, State) ->
{noreply, State};
handle_info({'EXIT', Pid, Reason}, State) ->
log_exit(Pid, Reason),
check_done(error, Pid, State);
handle_info(_, State) ->
{noreply, State}.

How to make a gen_server reply with a message?

I have the gen_server shown below. It works for the most part. However when I start it from the shell the replies come right back to the shell prompt. I would have expected them to be sent as messages back to the shells pid and then I would use flush() to see them.
What do I have to change in order to have the foo_worker send its replies as messages ?
-module(foo_worker).
-behaviour(gen_server).
%% API
-export([start_link/1, start/1, init/1, send/3, die/1]).
-export([handle_call/3, handle_cast/2, handle_info/2, terminate/2]).
%%%-------------------------------------------------------------------
send(Worker, Ref, Counter) ->
gen_server:call(Worker, {inc, Ref, Counter}).
die(Worker) ->
gen_server:cast(Worker, die).
%%%-------------------------------------------------------------------
start_link(Limit) ->
gen_server:start_link(?MODULE, [Limit], []).
start(Limit) ->
gen_server:start(?MODULE, [Limit], []).
init([Limit]) ->
{ok, Limit}.
handle_call(_, _, Limit) when Limit =< 0 ->
exit({worker, eol});
handle_call({inc, Ref, Data}, From, Limit) ->
io:format("From ~p~n", [From]),
{reply, {Ref, updated, Data+1}, Limit - 1}.
handle_cast(die, _) ->
io:format("~p Dying ~n",[self()]),
exit(normal).
handle_info(Info, State) ->
io:format("Unkown message ~p for state ~p~n", [Info, State]).
terminate(Reason, State) ->
io:format("~p Died because ~p with state ~p~n", [self(), Reason, State]).
The whole point of gen_server:call/2,3 is to wrap into a function call the passing of a message into a gen_server process and the reception of its reply. If you want to deal only with messages, don't use gen_server:call/2,3 but rather have the caller invoke gen_server:cast/2 and include the caller pid in the message:
send(Worker, Ref, Counter) ->
gen_server:cast(Worker, {inc, Ref, Counter, self()}).
Then have gen_server:handle_cast/2 understand that message and use the pid send the reply back to the caller:
handle_cast({inc, Ref, Data, From}, Limit) ->
From ! {Ref, updated, Data+1},
{noreply, Limit-1}.
By the way, note that when you choose this sort of approach, you need to deal with possible failure. If you pass a message to the gen_server process but it dies before it sends you a reply, you need to make sure the caller doesn't sit and wait forever for a reply that will never arrive. The best way to do this is with a monitor — you can have the caller monitor the gen_server process before sending it a message and demonitor it once it receives the reply. If the gen_server process dies, the caller will get a DOWN message instead (see the monitor documentation for details). Also note that by doing this you're reimplementing a bunch of what gen_server:call/2,3 already does for you.

gen_server not getting messages after httpc call

I have one process which sends a pause message to a gen_server like so:
Results = [gen_server:cast(Child, pause) ||
{Id, Child, _Type, _Modules} <- supervisor:which_children(?SERVER),
?IGNORE(Id) == false],
In my gen_server, I catch these messages in my handle_cast as follows:
handle_cast(pause, #state{task=#task{server=Serv,
service=Srv,
description=Desc}}=State) ->
lager:info("Suspending ~s, ~s, ~s.",[Serv, Srv, Desc]),
{noreply, State#state{suspended=true}};
handle_cast(Msg, State) ->
lager:error("Url Poller received unexpected cast message: ~p",[Msg]),
{noreply, State}.
What's really strange is that fairly frequently one of my gen_servers doesn't seem to receive the pause message -- I get no lager message and the process in question will not respond to subsequent attempts to pause (or resume).
Any ideas about what might be going on?
The gen_server is very simple, it uses erlang:send_after/3 to send itself a "poll" message. Upon receiving this poll message, if not paused, it hits a url and saves the response to an ETS and fires off another erlang:send_after/3 to poll again after an appropriate interval. If its paused, it simply fires off another erlang:send_after?3
All pause does is set the state to paused = true
Using observer, the stuck process shows that the current function is httpc:handle_answer and that the message queue is backing up
Sate Tab: Information "Timed out"
Tip "system messages are probably not treated by this process"
the top of the stack trace shows
httpc:handle_answer httpc.erl:636
I picked the code of httpc:handle_answer from github erlang otp inets http client:
(Note: it is not the same version as yours since the function goes from line 616 to 631)
handle_answer(RequestId, false, _) ->
{ok, RequestId};
handle_answer(RequestId, true, Options) ->
receive
{http, {RequestId, saved_to_file}} ->
?hcrt("received saved-to-file", [{request_id, RequestId}]),
{ok, saved_to_file};
{http, {RequestId, {_,_,_} = Result}} ->
?hcrt("received answer", [{request_id, RequestId},
{result, Result}]),
return_answer(Options, Result);
{http, {RequestId, {error, Reason}}} ->
?hcrt("received error", [{request_id, RequestId},
{reason, Reason}]),
{error, Reason}
end.
So the process is waiting for a message (coming after a call to httpc_manager:request(Request, profile_name(Profile) which has returned {ok, RequestId}), and this message does not come or it has a wrong format. Can you check the values of the parameters and the message queue?
headers which contained value other than string caused the httpc_handler exited. But after that, the caller hung at the 'receive' in httpc:handle_answer/3 forever since no message was sent to the caller.
you can test with this
Request1= {"http://www.google.com",[{"cookie",undefined}, {"test",123}],"application/x-www-form-urlencoded; charset=utf-8", <<"">>}.
httpc:request(post, Request1, [{timeout,1000}], []).

Resources