Good day,
I have a gen_server process which does some long-running state-updating tasks periodically in
handle_info:
handle_info(trigger, State) ->
NewState = some_long_running_task(),
erlang:send_after(?LOOP_TIME, self(), trigger),
{noreply, NewState}.
But when such task runs, then whole server gets unresponsive and any call to it leads to whole server crash:
my_gen_server:status().
** exception exit: {timeout,{gen_server,call,[my_gen_server,status]}}
in function gen_server:call/2
How it is possible to avoid blocking of gen_server ?
And when one call my_gen_server:status() at any time, the result should be something like:
{ok, task_active}
execute the long running task in a separate process. Let this process inform the gen_server of its progress with the task (that is if the task's progress can be tracked) OR let the process complete the task or fail but at least inform the gen_server of the results of the task.
Let the gen_server be linked with the process doing this long running task, and let the gen_server know the PID or registered name so that in case of exit signals, it can isolate the death of that important process from the Rest.
handle_info(trigger, State) ->
Pid = spawn_link(?MODULE,some_long_running_task,[State]),
NewState = save_pid(Pid,State),
{noreply, NewState};
handle_info({'EXIT',SomePid,_},State)->
case lookup_pid(State) == SomePid of
false -> %% some other process
{noreply,State};
true ->
%% our process has died
%% what do we do now ?
%% spawn another one ?
%% thats your decision to take
....
....
{noreply,State}
end;
handle_info({finished,TaskResult},State)->
.....%% update state e.t.c.
erlang:send_after(?LOOP_TIME, self(), trigger),
{noreply,NewState}.
some_long_running_task(ServerState)->
....do work
....return results
This call does not lead to a crash, but simply to an exception which can be caught:
status() ->
try gen_server:call(my_gen_server, status)
catch
exit:{timeout,_} -> {ok, task_active}
end.
However, the call will remain in the server's queue, and after it finishes handling the current message, it will send a reply message: {ServerRef, Reply}, which should be discarded by the calling process.
The only way to avoid blocking of any process in Erlang (whether gen_server or not) is not to run blocking tasks on it. So another alternative could be to run your long tasks on a different process which only talks to your server, so nobody cares that it's blocked.
Related
I am trying to update my process's state on a 10 second timer.
-define(INTERVAL, 3000).
start_link() ->
gen_server:start_link(?MODULE, [], []).
action(Pid, Action) ->
gen_server:call(Pid, Action).
init([]) ->
erlang:send_after(?INTERVAL, self(), trigger),
{ok, temple:new()}.
what I want to do is call this
handle_call({fight}, _From, Temple) ->
NewTemple = temple:fight(Temple),
{reply, NewTemple, NewTemple};
So I try
handle_info(trigger, _State) ->
land:action(self(), {fight}),
erlang:send_after(?INTERVAL, self(), trigger);
but I get
=ERROR REPORT==== 4-Dec-2016::19:00:35 ===
** Generic server <0.400.0> terminating
** Last message in was trigger
** When Server state == {{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}},
[]}
** Reason for termination ==
** {function_clause,[{land,terminate,
[{timeout,{gen_server,call,[<0.400.0>,{fight}]}},
{{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[]}}},
[]}],
[{file,"src/land.erl"},{line,47}]}
It appears that with land:action(self(), {fight}), you're attempting to make a call to the same gen_server in which you're currently handling the trigger message. Two important facts will explain why this can't work:
A call always waits for the result to be returned.
A gen_server is a process, and a process can handle only one message at a time.
In handling the trigger message, you're saying to call back to yourself and wait for yourself to process the {fight} message. Since you're in the middle of handling the trigger message, though, you'll never get to the {fight} message. You're effectively in a deadlock with yourself. That's why you're getting a timeout.
P.S. Posting an SSCCE is far more likely to get you good answers.
The error message means that there is no terminate clause in the land server module, you should have a warning when you compile the land server module.
The terminate clause is called because a timeout occurs during the gen_server call with the parameters (Pid = <0.400.0>, Message = {fight}), which was called on line land:action(self(), {fight}),. A call to a gen_server must be completed within a maximum time, by default 5000ms, You have to reduce the time spent in the fight action.
Note that it is not a good idea to increase the server timeout since a gen_server call is blocking: during the execution of a gen_server call no new message can be handled, and in your example , it is also blocking the execution of the handle_info(trigger, _State) code.
Last point, the clause handle_info(trigger, _State) should return a tuple of the form {noreply,NewState} while the last line erlang:send_after(?INTERVAL, self(), trigger); returns a timer reference, you have to modify this line.
gen_server documentation on Module:terminate callback says:
Even if the gen_server process is not part of a supervision tree, this
function is called if it receives an 'EXIT' message from its parent.
Reason is the same as in the 'EXIT' message.
Here is my handle_info and terminate function:
handle_info(UnknownMessage, State) ->
io:format("Got unknown message: ~p~n", [UnknownMessage]),
{noreply, State}.
terminate(Reason, State) ->
io:format("Terminating with reason: ~p~n", [Reason]).
I start this server using gen_server:start. I assume when I call erlang:exit(Pid, fuckoff), it should call terminate callback function. But it shows:
Got unknown message: {'EXIT',<0.33.0>,fuckoff}
Which means it is calling handle_info. But when I call gen_server:stop, everything works as mentioned in documentation. I'm calling my gen_server from shell. Would you please clarify this?
[UPDATE]
Here is source code of decode_msg function inside gen_server. If it receives any 'EXIT' message it should call terminate function:
decode_msg(Msg, Parent, Name, State, Mod, Time, Debug, Hib) ->
case Msg of
{system, From, Req} ->
sys:handle_system_msg(Req, From, Parent, ?MODULE, Debug,
[Name, State, Mod, Time], Hib);
{'EXIT', Parent, Reason} ->
terminate(Reason, Name, Msg, Mod, State, Debug);
_Msg when Debug =:= [] ->
handle_msg(Msg, Parent, Name, State, Mod);
_Msg ->
Debug1 = sys:handle_debug(Debug, fun print_event/3,
Name, {in, Msg}),
handle_msg(Msg, Parent, Name, State, Mod, Debug1)
end.
In my case it doesn't call terminate function.
[UPDATE]
When I start gen_server using gen_server:start_link(), sending an exit signal using erlang:exit(Pid, Reason) will result in calling terminate call back function which is an expected behaviour. It seems there is a difference in interpreting an exit signal whether a process is linked to its parent or not.
Short answer:
If you call the exit/2 function from inside the gen_server actor, it behaves as expected based on the documentation and the terminate/2 callback will be called.
Long answer:
When you send the exit message from the shell, the Parent value of exit tuple is set to the shell process id, on the other hand when you start the gen_server process form shell its Parent value is set to its own process id, not the shell process id, therefore when it gets the exit message it doesn't match the second clause of the receive block in decode_msg/8 function so the terminate/6 function is not called and finally the next clause is matched which is calling handle_msg/5 function.
Recommendation:
For getting the terminate/3 callback called even by sending an exit message to the gen_server process, you can trap the exit message in handle_info/2 and then return with stop tuple as follows:
init([]) ->
process_flag(trap_exit, true),
{ok, #state{}}.
handle_info({'EXIT', _From, Reason}, State) ->
io:format("Got exit message: ~p~n", []),
{stop, Reason, State}.
terminate(Reason, State) ->
io:format("Terminating with reason: ~p~n", [Reason]),
%% do cleanup ...
ok.
When you start your gen_server, its a simple process, so, erlang:exit/1 or erlang:exit/2 work as expected.
If Pid is not trapping exits, Pid itself exits with exit reason Reason.
If Pid is trapping exits, the exit signal is transformed into a message {'EXIT', From, Reason} and delivered to the message queue of Pid.
So, currently your code trap 'EXIT' signal because this one is send like any other message into the mailbox and match handle_info/2 wildcard pattern.
If you want more information about that, you can read gen_server source code and see how it works. You can also find your problem described in this code.
I have a simple_one_for_one supervisor which has gen_fsm children.
I want each gen_fsm child to send a message only on the last time it terminates.
Is there any way to know when is the last cycle?
here's my supervisor:
-module(data_sup).
-behaviour(supervisor).
%% API
-export([start_link/0,create_bot/3]).
%% Supervisor callbacks
-export([init/1]).
%%-compile(export_all).
%%%===================================================================
%%% API functions
%%%===================================================================
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
RestartStrategy = {simple_one_for_one, 0, 1},
ChildSpec = {cs_fsm, {cs_fsm, start_link, []},
permanent, 2000, worker, [cs_fsm]},
Children = [ChildSpec],
{ok, {RestartStrategy, Children}}.
create_bot(BotId, CNPJ,Pid) ->
supervisor:start_child(?MODULE, [BotId, CNPJ, Pid]).
the Pid is the Pid of the process which starts the superviser and gives orders to start the children.
-module(cs_fsm).
-behaviour(gen_fsm).
-compile(export_all).
-define(SERVER, ?MODULE).
-define(TIMEOUT, 5000).
-record(params, {botId, cnpj, executionId, pid}).
%%%===================================================================
%%% API
%%%===================================================================
start_link(BotId, CNPJ, Pid) ->
io:format("start_link...~n"),
Params = #params{botId = BotId, cnpj = CNPJ, pid = Pid},
gen_fsm:start_link(?MODULE, Params, []).
%%%===================================================================
%%% gen_fsm callbacks
%%%===================================================================
init(Params) ->
io:format("initializing~n"),
process_flag(trap_exit, true),
{ok, requesting_execution, Params, 0}.
requesting_execution(timeout,Params) ->
io:format("erqusting execution"),
{next_state, finished, Params,?TIMEOUT}.
finished(timeout, Params) ->
io:format("finished :)~n"),
{stop, normal, Params}.
terminate(shutdown, _StateName, Params) ->
Params#params.pid ! {terminated, self(),Params},
ok;
terminate(_Reason, _StateName, Params) ->
ok.
my point is that if the process fails in any of the states it should send a message only if it is the last time it is restarted by the supervisor (according to its restart strategy).
If the gen_fsm fails, does it restart from the same state with same state data? If not how can I cause it to happen?
You can add sending the message to the Module:terminate/3 function which is called when one of the StateName functions returns {stop,Reason,NewStateData} to indicate that the gen_fsm should be stopped.
gen_fsm is a finite state machine so you decide how it transitions between states. Something that triggers the last cycle may also set something in the StateData that is passed to Module:StateName/3 so that the function that handles the state knows it's the last cycle. It's hard to give a more specific answer unless you provide some code which we could analyze and comment on.
EDIT after further clarification:
Supervisor doesn't notify its children which time it has restarted them and it also can't notify the child that it's the last restart. This later is simply because it doesn't know that it's going to be the last until the supervisor process actually crashes once more, which the supervisor can't possibly predict. Only after the child crashed supervisor can calculate how many times the child crashed during a period of time and if it is allowed to restart the child once more or if that was the last restart and now it's time for the supervisor to die as well.
However, nothing is stopping the child from registering, e.g. in an ETS table, how many times it has been restarted. But it of course won't help with deducting which restart is the last one.
Edit 2:
When the supervisor restarts the child it starts it from scratch using the standard init function. Any previous state of the child before it crashed is lost.
Please note that a crash is an exceptional situation and it's not always possible to recover the state, because the crash could have corrupted the state. Instead of trying to recover the state or asking supervisor when it's done restarting the child, why not to prevent the crash from happening in the first place? You have two options:
I. Use try/catch to catch any exceptional situations and act accordingly. It's possible to catch any error that would otherwise crash the process and cause supervisor to restart it. You can add try/catch to any entry function inside the gen_fsm process so that any error condition is caught before it crashes the server. See example function 1 or example function 2:
read() ->
try
try_home() orelse try_path(?MAIN_CFG) orelse
begin io:format("Some Error", []) end
catch
throw:Term -> {error, Term}
end.
try_read(Path) ->
try
file:consult(Path)
catch
error:Error -> {error, Error}
end.
II. Spawn a new process to handle the job and trap EXIT signals when the process dies. This allows gen_fsm to handle a job asynchronously and handle any errors in a custom way (not necessarily by restarting the process as a supervisor would do). This section titled Error Handling explains how to trap exit signals from child processes. And this is an example of trapping signals in a gen_server. Check the handle_info function that contains a few clauses to trap different types of EXIT messages from children processes.
init([Cfg, Id, Mode]) ->
process_flag(trap_exit, true),
(...)
handle_info({'EXIT', _Pid, normal}, State) ->
{noreply, State};
handle_info({'EXIT', _Pid, noproc}, State) ->
{noreply, State};
handle_info({'EXIT', Pid, Reason}, State) ->
log_exit(Pid, Reason),
check_done(error, Pid, State);
handle_info(_, State) ->
{noreply, State}.
Is there a way to tell a gen_server: "supervisor has initialised all gen_servers, now you can send then messages"?
I have a worker gen_server whose job is to set up states of other gen_servers in his supervision tree. If I just start sending messages in init function of my configuration server, sometimes it gets {noproc, _}. I suppose that means that config server was to fast: he sent messages before supervisor had enough time to start all workers. I fixed that by putting timer:sleep(500) in config_server:init(), which ensures all gen_server had enough time to initialise, but this seems like a inelegant solution.
Is there a proper way to do this?
Return tuple with timeout 0 from init. Then immediately after it returns, handle_info(timeout, State) will be called. In handle_info make some call which won't return until the supervisor finishes initialization (e.g. supervisor:which_children).
info(PlayerId) ->
Pid = case global:whereis_name(util:getRegName({?MODULE, PlayerId})) of
P when is_pid(P) ->
P;
_ ->
{ok, P} = player_sup:start_child(PlayerId),
P
end,
gen_server:call(Pid, info).
This is my case to handle this issue. This worker process is triggered only when it is requested.
in function init() call gen_server:cast(init, State). message "init" will be first in message queue
Normally if I'd like to have an Erlang process timeout I would use the following construct:
receive
Msg -> ok; %% handle message
after 60000 ->
%% Handle timeout and exit
end.
Is there a similar mechanism in the OTP servers such as gen_fsm? I will be spawning gen_fsm's for each active session with my application, and would like to have them exit if a timeout value for inactivity is exceeded after receiving a message.
I can write my own custom process if need be, but would prefer to use a gen_fsm if possible.
I dug some more and found the answer to my own question.
There is an optional fourth argument in message handler "Result"s that you can use which is a timeout.
so:
some_fsm_state({set, Val}, State) ->
NewState = do(Val, State),
{next_state, another_fsm_state, NewState, 5000};
another_fsm_state(timeout, State) ->
handle_timeout(State).
another_fsm_state({set, Val}, State) ->
%% more code that handles this state.
Once some_fsm_state is called, it transitions to the next state of "another_fsm_state" with a timeout of 5000ms. If not new message is received within 5000ms, then another_fsm_state(timeout, State) is called.
Clever OTP programmers. :)
It should be noted that this fourth element in the Results tuple can be hibernate. Please see Erlang documentation for more information.
Erlang - Hibernate
gen_fsm docs