Erlang supervisor termination behavior

Erlang supervisor termination behavior - erlang

I have what may be an unusual situation, an application that starts 2 top-level supervisors, e.g.,
...
-behavior(application).
...
start(_StartType, _StartArgs) ->
sup1:start_link(),
sup2:start_link().
They both have a {one_for_one, 0, 1} restart strategy. Their children implement a simple crash function that throws a bad_match error.
To my question, if I call sup1_child1:crash() supervisor sup1 will terminate but the application will keep running (i.e., supervisor sup2 and its children are still available). If instead I call sup2_child1:crash() then the entire application terminates. This latter behavior is what I expect in both cases. If I flip the order of the start_link() calls, i.e.,
...
sup2:start_link(),
sup1:start_link().
then crashing sup1 will cause the application to terminate but crashing sup2 will not. So it appears the order in which start_link() is called determines which supervisor crash will cause the application to terminate. Is this expected? Or am I abusing the supervision tree capability by having 2 root supervisors?
Thanks,
Rich

It is entirely expected, and it is expected because you are abusing the supervision tree capability. There is a hidden supervisor called the "application supervisor". Your application:start function is supposed to return a SINGLE pid which is to be monitored by the application supervisor. If that process crashes, the BEAM VM will also crash (depending, actually, on how the application is started; similar to worker processes, your applications can be permanent or transient (maybe even temporary)).
You should have one top-level supervisor (your application supervisor). If you need two supervisors at the top level, they should both be children of your application supervisor.

Related

How do you hook to an application in Erlang

I do not understand how do you hook to an Erlang application since it does not return the Pid.
Consider for example the snippet below.I am starting a Pid which receives messages to process.However my application behaviour does not return anything.
How do i hook to the Pid i am interested in when using application behaviour ?
.app
{
application,simple_app,
[
{description,"something"},
{mod,{simple_app,[]}},
{modules,[proc]}
]
}
app
-module(simple_app).
-behaviour(application).
-export([start/2, stop/1]).
start(_StartType, _StartArgs) ->
proc:start().
stop(_State) ->
ok.
module
-module(proc).
-export([start/0]).
start()->
Pid=spawn_link(?MODULE,loop,[]),
{ok,Pid}.
loop()->
receive
{From,Message}-> From ! {ok,Message},
loop();
_ ->loop()
end.
P.S I am trying to understand how do i get the root Pid to further use it to issue commands ? In my case i need the Pid of the proc:start module.If my root was a supervisor , i would need the Pid of the supervisor.The application does not return a Pid? How do i hook to it ?
The question thus is when starting the application wouldn't i need a Pid returned by it to then be able to issue commands against?

Your application must depend on kernel and stdlib. You should define their names in your .app file, for example:
{
application,simple_app,
[
{description,"something"},
{mod,{simple_app,[]}},
{modules,[proc]},
{applications, [kernel, stdlib]}
]
}
When you want to start your app, you should use the application module which is part of the kernel application.
It starts some processes to manage your application and I/O handling. It calls YOUR_APP:start(_, _) and this function MUST return a Pid which is running the supervisor behaviour. We often call it the root supervisor of app.
So you have to define an application behaviour (as you did) and a supervisor behaviour.
This supervisor process may start your workers which are doing anything your app wants to do.
If you want to start a process, you define its start specification in your supervisor module. So kernel starts your app and your app starts your supervisor and your supervisor starts your worker(s).
You can register your worker pid with a name and you can send it messages by using its name.
If you have lots of workers you can use a pool of pids which maintains your worker pids.
I think it's OK to play with spawn and spawn_link and sending messages manually to processes. But in production code we usually don't do this. We use OTP behaviours and they do this for us in a reliable and clean manner.
I think it's better to write some gen_servers (another behaviour) and play with handle_call and handle_cast, etc callbacks. Then run some gen_servers under a supervision tree and play with the supervisor API to kill or terminate its children, etc. Then start writing a complete application.
Remember to read the documentation for behaviours carefully.

How to iterate through many GenServers of the same type?

Problem: I have a type FlyServer and need to iterate through all the Fly processes. For various computations on the server.
How do I accomplish this?
One option is to have a GenServer list of all the FlyServer processes. But what if it crashes? And what if a player crashes and for whatever reason the GenServer keeping track of the processes isn't notified --- chime in if that scenario is unrealistic please.

I advise you to start your servers using a supervisor with a call to supervisor:start_child/2. The supervisor should use the strategy simple_one_for_one which is meant to create and supervise processes of the same kind.
Then you can get an updated list of all the chidren using the function supervisor:which_children/1

Every time a Fly process contacts the server, you can add its pid to a list, where the list is part of the gen_server's State.
The server can then monitor the Fly process, which means that when a Fly process terminates, the server will get sent a special message.
The server can implement a receive clause that pattern matches the special message and then removes the terminated process's pid from the list.
One option is to have a GenServer list of all the FlyServer processes.
But what if it crashes?
Then terminate(Reason, State) will be called in the callback module, which can save State to an ets, dets, or mnesia table. Of course, if someone trips over the cord that connects the server running the FlyServer to an electrical outlet, then execution will immediately halt and terminate() will not be called. See distributed erlang for solutions.

In Erlang, what's the difference between gen_server:start() and gen_server:start_link()?

Can someone explain what's the difference between gen_server:start() and gen_server:start_link()?
I've been told that it's something about multi threading stuff.
EDIT:
If my gen_server is called from multiple threads, will it execute them all at once ? Or will it create concurrency between these threads?

Both functions start new gen_server instances as children of the calling process, but they differ in that the gen_server:start_link/3,4 atomically starts a gen_server child and links it to its parent process. Linking means that if the child dies, the parent will by default also die. Supervisors are parent processes that use links to take specific actions when their child processes exit abnormally, typically restarting them.
Other than the linking involved in the gen_server:start_link case, there are no multi-process aspects involved in these calls. Regardless of whether you use gen_server:start or gen_server:start_link to start a new gen_server, the new process has a single message queue, and it receives and processes those messages one at a time. There is nothing about gen_server:start_link that causes the new gen_server process to behave or perform differently than it would if started with gen_server:start.

When you use gen_server:start_link new process becomes "child" of calling process - it's part of supervision tree. It allows calling process to be notified if gen_server process dies.
Using gen_server:start will spawn process outside of supervision tree.
Nice description of supervision in Erlang is here: http://learnyousomeerlang.com/supervisors

Otp application:stop(..) kills all spawned processes, not just spawn_linked ones?

I've set up a simple test-case at https://github.com/bvdeenen/otp_super_nukes_all that shows that an otp application:stop() actually kills all spawned processes by its children, even the ones that are not linked.
The test-case consists of one gen_server (registered as par) spawning a plain erlang process (registered as par_worker) and a gen_server (registered as reg_child), which also spawns a plain erlang process (registered as child_worker). Calling application:stop(test_app) does a normal termination on the 'par' gen_server, but an exit(kill) on all others!
Is this nominal behaviour? If so, where is it documented, and can I disable it? I want the processes I spawn from my gen_server (not link), to stay alive when the application terminates.
Thanks
Bart van Deenen

The application manual says (for the stop/1 function):
Last, the application master itself terminates. Note that all processes with the
application master as group leader, i.e. processes spawned from a process belonging
to the application, thus are terminated as well.
So I guess you cant modify this behavior.
EDIT: You might be able to change the group_leader of the started process with group_leader(GroupLeader, Pid) -> true (see: http://www.erlang.org/doc/man/erlang.html#group_leader-2). Changing the group_leader might allow you to avoid killing your process when the application ends.

I made that mistakes too, and found out it must happen.
If parent process dies, all children process dies no matter what it is registered or not.
If this does not happen, we have to track all up-and-running processes and figure out which is orphaned and which is not. you can guess how difficult it would be. You can think of unix ppid and pid. if you kill ppid, all children dies too. This, I think this must happen.
If you want to have processes independent from your application, you can send a messageto other application to start processes.
other_application_module:start_process(ProcessInfo).

Supervisors with backoff

I have a supervisor with two worker processes: a TCP client which handles connection to a remote server and an FSM which handles the connection protocol.
Handling TCP errors in the child process complicates code significantly. So I'd prefer to "let it crash", but this has a different problem: when the server is unreachable, the maximum number of restarts will be quickly reached and the supervisor will crash along with my entire application, which is quite undesirable for this case.
What I'd like is to have a restart strategy with back-off; failing that, it would be good enough if the supervisor was aware when it is restarted due to a crash (i.e. had it passed as a parameter to the init function). I've found this mailing list thread, but is there a more official/better tested solution?

You might find our supervisor cushion to be a good starting point. I use it slow down the restart on things that must be running, but are failing quickly on startup (such as ports that are encountering a resource problem).

I've had this problem many times working with erlang and tried many solutions. I think the best best I've found is to have an extra process that is started by the supervisor and starts the that might crash.
It starts the child on start-up, awaits child exits and restarts the child (with a delay) or exits as appropriate. I think this is simpler than the back-off server (which you link to) as you only need to keep state regarding a single child.
Another solution that I've used is to have to start the child processes as transient and have a separate process that polls and issues restarts to any processes that have crashed.

So first you want to catch an early termination of the child by using a process_flag(trap_exit, true) in your init.
Then you need to decide how long you want to delay a restart by, for example 10 sec., do this in the
handle_info({'EXIT', _Pid, Reason}, State) ->
erlang:send_after(10000, self(), {die, Reason}),
{noreply, State};
Lastly, let the process die with
handle_info({die, Reason}, State) ->
{stop, Reason, State};

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Erlang supervisor termination behavior - erlang

Related

How do you hook to an application in Erlang

How to iterate through many GenServers of the same type?

In Erlang, what's the difference between gen_server:start() and gen_server:start_link()?

Otp application:stop(..) kills all spawned processes, not just spawn_linked ones?

Supervisors with backoff

Categories

Resources