Related
I have a gen_server (my_gen_server.erl) that is started by another server (i.e. ejabberd)
Inside my_gen_server.erl, I start another server which handles HTTP2 calls like this:
{ok, ServerPid} = apns:connect(cert, my_first_connection).
Now, my_gen_server is receiving messages both from ejabberd and ServerPid which I handle as follows:
1. handle_info({reconnecting, ServerPid}=Msg, State) -> %% do Something
2. handle_info({connection_up, ServerPid}=Msg, State) -> %% do Something
3. handle_info(#offline_msg{...} = _Msg, State) -> %% do Something
So 1 & 2 are sent by ServerPid and 3 is sent by ejabberd. This is working but I am not sure about the correct behavior. So,
My question is:
Is this correct gen_server behavior to receive/handle messages from multiple client processes?
Please help.
Any process that has the gen_server's pid can send the gen_server a message using !, which will be handled by the gen_server's function:
handl_info()
Any process that has the gen_server's pid can call the functions:
call(GenServerPid, Msg)
cast(GenServerPid, Msg)
which will be handled by the gen_server functions:
handle_call()
handle_cast()
In elixir, there is a module called Agent, which is just a gen_server that stores State, like a counter. Multiple processes can update the counter and retrieve the current count. Of course, some process has to start the gen_server, then pass the pid to the other processes that want to update/retrieve the count.
I am implementing the Gossip Algorithm in which multiple actors spread a gossip at the same time in parallel. The system stops when each of the Actor has listened to the Gossip for 10 times.
Now, I have a scenario in which I am checking the listen count of the recipient actor before sending the gossip to it. If the listen count is already 10, then gossip will not be sent to the recipient actor. I am doing this using synchronous call to get the listen count.
def get_message(server, msg) do
GenServer.call(server, {:get_message, msg})
end
def handle_call({:get_message, msg}, _from, state) do
listen_count = hd(state)
{:reply, listen_count, state}
end
The program runs well in the starting but after some time the Genserver.call stops with a timeout error like following. After some debugging, I realized that the Genserver.call becomes dormant and couldn't initiate corresponding handle_call method. Is this behavior expected while using synchronous calls? Since all actors are independent, shouldn't the Genserver.call methods be running independently without waiting for each others response.
02:28:05.634 [error] GenServer #PID<0.81.0> terminating
** (stop) exited in: GenServer.call(#PID<0.79.0>, {:get_message, []}, 5000)
** (EXIT) time out
(elixir) lib/gen_server.ex:774: GenServer.call/3
Edit: The following code can reproduce the error when running in iex shell.
defmodule RumourActor do
use GenServer
def start_link(opts) do
{:ok, pid} = GenServer.start_link(__MODULE__,opts)
{pid}
end
def set_message(server, msg, recipient) do
GenServer.cast(server, {:set_message, msg, server, recipient})
end
def get_message(server, msg) do
GenServer.call(server, :get_message)
end
def init(opts) do
state=opts
{:ok,state}
end
def handle_cast({:set_message, msg, server, recipient},state) do
:timer.sleep(5000)
c = RumourActor.get_message(recipient, [])
IO.inspect c
{:noreply,state}
end
def handle_call(:get_message, _from, state) do
count = tl(state)
{:reply, count, state}
end
end
Open iex shell and load above module. Start two processes using:
a = RumourActor.start_link(["", 3])
b = RumourActor.start_link(["", 5])
Produce error by calling a deadlock condition as mentioned by Dogbert in comments. Run following without much time difference.
cb = RumourActor.set_message(elem(a,0), [], elem(b,0))
ca = RumourActor.set_message(elem(b,0), [], elem(a,0))
Wait for 5 seconds. Error will appear.
A gossip protocol is a way of dealing with asynchronous, unknown, unconfigured (random) networks that may be suffering intermittent outages and partitions and where no leader or default structure is present. (Note that this situation is somewhat unusual in the real world and out-of-band control is always imposed on systems in some way.)
With that in mind, let's change this to be an asynchronous system (using cast) so that we are following the spirit of the concept of chatty gossip style communication.
We need digest of messages that counts how many times a given message has been received, a digest of messages that have been received and are already over the magic number (so we don't re-send one if it is way late), and a list of processes enrolled in our system so we know to whom we are broadcasting:
(The following example is in Erlang because I just trip over Elixir syntax ever since I stopped using it...)
-module(rumor).
-record(s,
{peers = [] :: [pid()],
digest = #{} :: #{message_id(), non_neg_integer()},
dead = sets:new() :: sets:set(message_id())}).
-type message_id() :: zuuid:uuid().
Here I am using a UUID, but it could be whatever. An Erlang reference would be fine for a test case, but since gossip isn't useful within an Erlang cluster, and references are unsafe outside the originating system I'm just jumping to the assumption this is for a networked system.
We will need an interface function that allows us to tell a process to inject a new message into the system. We will also need an interface function that sends a message between two processes once it is already in the system. Then we will need an inner function that broadcasts messages to all the known (subscribed) peers. Ah, that means we need a greeting interface so that peer processes can notify each other of their presence.
We will also want a way to have a process tell itself to keep broadcasting over time. How long to set the interval on retransmission is not actually a simple decision -- it has everything to do with network topology, latency, variability, etc (you would actually probably occasionally ping peers and develop some heuristic based on the latency, drop peers that seem unresponsive, and so on -- but we're not going to get into that madness here). Here I'm just going to set it for 1 second because that is an easy to interpret interval for humans observing the system.
Note that everything below is asynchronous.
Interfaces...
insert(Pid, Message) ->
gen_server:cast(Pid, {insert, Message}).
relay(Pid, ID, Message) ->
gen_server:cast(Pid, {relay, ID, Message}).
greet(Pid) ->
gen_server:cast(Pid, {greet, self()}).
make_introduction(Pid, PeerPid) ->
gen_server:cast(Pid, {make_introduction, PeerPid}).
That last function is going to be our way as testers of the system to cause one of the processes to call greet/1 on some target Pid so they start to build a peer network. In the real world something slightly different usually goes on.
Inside our gen_server callback for receiving a cast we will get:
handle_cast({insert, Message}, State) ->
NewState = do_insert(Message, State);
{noreply, NewState};
handle_cast({relay, ID, Message}, State) ->
NewState = do_relay(ID, Message, State),
{noreply, NewState};
handle_cast({greet, Peer}, State) ->
NewState = do_greet(Peer, State),
{noreply, NewState};
handle_cast({make_introduction, Peer}, State) ->
NewState = do_make_introduction(Peer, State),
{noreply, NewState}.
Pretty simple stuff.
Above I mentioned that we would need a way for this thing to tell itself to resend after a delay. To do that we are going to send ourselves a naked message to "redo_relay" after a delay using erlang:send_after/3 so we are going to need a handle_info/2 to deal with it:
handle_info({redo_relay, ID, Message}, State) ->
NewState = do_relay(ID, Message, State),
{noreply, NewState}.
Implementation of the message bits is the fun part, but none of this is terribly tricky. Forgive the do_relay/3 below -- it could be more concise, but I'm writing this in a browser off the top of my head, so...
do_insert(Message, State = #s{peers = Peers, digest = Digest}) ->
MessageID = zuuid:v1(),
NewDigest = maps:put(MessageID, 1, Digest),
ok = broadcast(Message, Peers),
ok = schedule_resend(MessageID, Message),
State#s{digest = NewDigest}.
do_relay(ID,
Message,
State = #s{peers = Peers, digest = Digest, dead = Dead}) ->
case maps:find(ID, Digest) of
{ok, Count} when Count >= 10 ->
NewDigest = maps:remove(ID, Digest),
NewDead = sets:add_element(ID, Dead),
ok = broadcast(Message, Peers),
State#s{digest = NewDigest, dead = NewDead};
{ok, Count} ->
NewDigest = maps:put(ID, Count + 1),
ok = broadcast(ID, Message, Peers),
ok = schedule_resend(ID, Message),
State#s{digest = NewDigest};
error ->
case set:is_element(ID, Dead) of
true ->
State;
false ->
NewDigest = maps:put(ID, 1),
ok = broadcast(Message, Peers),
ok = schedule_resend(ID, Message),
State#s{digest = NewDigest}
end
end.
broadcast(ID, Message, Peers) ->
Forward = fun(P) -> relay(P, ID, Message),
lists:foreach(Forward, Peers).
schedule_resend(ID, Message) ->
_ = erlang:send_after(1000, self(), {redo_relay, ID, Message}),
ok.
And now we need the social bits...
do_greet(Peer, State = #s{peers = Peers}) ->
case lists:member(Peer, Peers) of
false -> State#s{peers = [Peer | Peers]};
true -> State
end.
do_make_introduction(Peer, State = #s{peers = Peers}) ->
ok = greet(Peer),
do_greet(Peer, State).
So what did all of the horribly untypespecced stuff up there do?
It avoided any possibility of a deadlock. The reason deadlocks are so, well, deadly in peer systems is that anytime you have two identical processes (or actors, or whatever) communicating synchronously, you have created a textbook case of a potential deadlock.
Any time A has a synchronous message headed toward B and B has a synchronous message headed toward A at the same time you now have a deadlock. There is no way to create to identical processes that call each other synchronously without creating a potential deadlock. In massively concurrent systems anything that might happen almost certainly will eventually, so you're going to run into this sooner or later.
Gossip is intended to be asynchronous for a reason: it is a sloppy, unreliable, inefficient way to deal with a sloppy, unreliable, inefficient network topology. Trying to make calls instead of casts not only defeats the purpose of gossip-style message relay, it also pushes you into impossible deadlock territory incident to changing the nature of the protocol from asynch to synch.
Genser.call has a default timeout of 5000 milliseconds. So what probably happening is, the message queue of the actor is filled with millions of messages and by the time it reaches to call, the calling actor has timed out.
You can handle timeout using a try...catch:
try do
c = RumourActor.get_message(recipient, [])
catch
:exit, reason ->
# handle timeout
Now, the called actor will finally get to the call message and respond, which will come as an unexpected message to the first process. This you'll need to handle using handle_info. So one way is to ignore the error in catch block and send it rumor from handle_info.
Also, this will significantly degrade the performance if there are many process waiting to be timed-out for 5 seconds before moving ahead. One could deliberately reduce the timeout and handle the reply in handle_info. This will reduce to using cast and handling reply from other process.
Your blocking call need to be broken into two non blocking calls. So if A is making a blocking call to B, instead of waiting for reply, A can ask B to send its state on a given address (A's address) and move on.
Then A will handle that message separately and reply if necessary.
A.fun1():
body of A before blocking call
result = blockingcall()
do things based on result
needs to be divided into:
A.send():
body of A before blocking call
nonblockingcall(A.receive) #A.receive is where B should send results
do other things
A.receive(result):
do things based on result
Good day,
I have a gen_server process which does some long-running state-updating tasks periodically in
handle_info:
handle_info(trigger, State) ->
NewState = some_long_running_task(),
erlang:send_after(?LOOP_TIME, self(), trigger),
{noreply, NewState}.
But when such task runs, then whole server gets unresponsive and any call to it leads to whole server crash:
my_gen_server:status().
** exception exit: {timeout,{gen_server,call,[my_gen_server,status]}}
in function gen_server:call/2
How it is possible to avoid blocking of gen_server ?
And when one call my_gen_server:status() at any time, the result should be something like:
{ok, task_active}
execute the long running task in a separate process. Let this process inform the gen_server of its progress with the task (that is if the task's progress can be tracked) OR let the process complete the task or fail but at least inform the gen_server of the results of the task.
Let the gen_server be linked with the process doing this long running task, and let the gen_server know the PID or registered name so that in case of exit signals, it can isolate the death of that important process from the Rest.
handle_info(trigger, State) ->
Pid = spawn_link(?MODULE,some_long_running_task,[State]),
NewState = save_pid(Pid,State),
{noreply, NewState};
handle_info({'EXIT',SomePid,_},State)->
case lookup_pid(State) == SomePid of
false -> %% some other process
{noreply,State};
true ->
%% our process has died
%% what do we do now ?
%% spawn another one ?
%% thats your decision to take
....
....
{noreply,State}
end;
handle_info({finished,TaskResult},State)->
.....%% update state e.t.c.
erlang:send_after(?LOOP_TIME, self(), trigger),
{noreply,NewState}.
some_long_running_task(ServerState)->
....do work
....return results
This call does not lead to a crash, but simply to an exception which can be caught:
status() ->
try gen_server:call(my_gen_server, status)
catch
exit:{timeout,_} -> {ok, task_active}
end.
However, the call will remain in the server's queue, and after it finishes handling the current message, it will send a reply message: {ServerRef, Reply}, which should be discarded by the calling process.
The only way to avoid blocking of any process in Erlang (whether gen_server or not) is not to run blocking tasks on it. So another alternative could be to run your long tasks on a different process which only talks to your server, so nobody cares that it's blocked.
I'm getting started with Erlang, and could use a little help understanding the different results when applying the PID returned from spawn/3 to the process_info/1 method.
Given this simple code where the a/0 function is exported, which simply invokes b/0, which waits for a message:
-module(tester).
-export([a/0]).
a() ->
b().
b() ->
receive {Pid, test} ->
Pid ! alrighty_then
end.
...please help me understand the reason for the different output from the shell:
Example 1:
Here, current_function of Pid is shown as being tester:b/0:
Pid = spawn(tester, a, []).
process_info( Pid ).
> [{current_function,{tester,b,0}},
{initial_call,{tester,a,0}},
...
Example 2:
Here, current_function of process_info/1 is shown as being tester:a/0:
process_info( spawn(tester, a, []) ).
> [{current_function,{tester,a,0}},
{initial_call,{tester,a,0}},
...
Example 3:
Here, current_function of process_info/1 is shown as being tester:a/0, but the current_function of Pid is tester:b/0:
process_info( Pid = spawn(tester, a, []) ).
> [{current_function,{tester,a,0}},
{initial_call,{tester,a,0}},
...
process_info( Pid ).
> [{current_function,{tester,b,0}},
{initial_call,{tester,a,0}},
...
I assume there's some asynchronous code happening in the background when spawn/3 is invoked, but how does variable assignment and argument passing work (especially in the last example) such that Pid gets one value, and process_info/1 gets another?
Is there something special in Erlang that binds variable assignment in such cases, but no such binding is offered to argument passing?
EDIT:
If I use a function like this:
TestFunc = fun( P ) -> P ! {self(), test}, flush() end.
TestFunc( spawn(tester,a,[]) ).
...the message is returned properly from tester:b/0:
Shell got alrighty_then
ok
But if I use a function like this:
TestFunc2 = fun( P ) -> process_info( P ) end.
TestFunc2( spawn(tester,a,[]) ).
...the process_info/1 still shows tester:a/0:
[{current_function,{tester,a,0}},
{initial_call,{tester,a,0}},
...
Not sure what to make of all this. Perhaps I just need to accept it as being above my pay grade!
If you look at the docs for spawn it says it returns the newly created Pid and places the new process in the system scheduler queue. In other words, the process gets started but the caller keeps on executing.
Erlang is different from some other languages in that you don't have to explicitly yield control, but rather you rely on the process scheduler to determine when to execute which process. In the cases where you were making an assignment to Pid, the scheduler had ample time to switch over to the spawned process, which subsequently made the call to b/0.
It's really quite simple. The execution of the spawned process starts with a call to a() which at some point shortly afterwards will call b() and then just sits there and waits until it receives a specific message. In the examples where you manage to immediately call process_info on the pid, you catch it while the process is still executing a(). In the other cases, when some delay is involved, you catch it after it has called b(). What about this is confusing?
If I want to always send an event to the initial state of a gen_fsm when I have spawned it, where should I put that function call? Right after start_link or from the process that invoked start_link in the first place. Are there any best practices here?
If you just want to alter the state of the FSM after you start it, you might simply implement the init function for your state machine:
Reading from: http://www.erlang.org/doc/man/gen_fsm.html#Module:init-1
Whenever a gen_fsm is started using
gen_fsm:start/3,4 or
gen_fsm:start_link/3,4, this function
is called by the new process to
initialize.
Args is the Args argument provided to
the start function.
If initialization is successful, the
function should return
{ok,StateName,StateData},
{ok,StateName,StateData,Timeout} or
{ok,StateName,StateData,hibernate},
where StateName is the initial state
name and StateData the initial state
data of the gen_fsm.
Also, using the init function, you're sure about the atomicity of the two functions (start_link and init). They will both succeed or fail.
I thik it is right to send first event from the process invoking FSM start function. Or return timeout = 0 from init/1 and handle 'timeout' event in the initial state.
On the other hand, it makes races possible if your gen_fsm is a rgistered process. If that is the case I would send message to the gen_fsm process PID from init/1 befor registering.