I've got an Erlang application packed with Rebar that's meant to be run as a service. It clusters with other instances of itself.
One thing I've noticed is that if the application crashes on one node, the Erlang VM remains up even when the application reaches its supervisor's restart limit and vanishes forever. The result is that other nodes in the cluster don't notice anything until they try to talk to the application.
Is there a simple way to link the VM to the root supervisor, so that the application takes down the whole VM when it dies?
When starting your application using application:start() you can add the optional Type parameter to be one of the atoms permanent, transient or temporary. I guess you are looking for permanent.
As mentioned in application:start/2:
If a permanent application terminates, all other applications and the entire Erlang node are also terminated.
If a transient application terminates with Reason == normal, this is reported but no other applications are terminated. If a transient application terminates abnormally, all other applications and the entire Erlang node are also terminated.
If a temporary application terminates, this is reported but no other applications are terminated.
Related
I am having an existential question about how I design my erlang applications:
I usually create an application, which starts a supervisor and some workers.
Aside from the supervision tree, I have modules with functions (duh).
I also have a web API that calls functions from applications' modules.
When I stop my application (application:stop(foo).), the webserver can still call foo's functions.
I find it "not idiomatic" to not be able to have a proper circuit-breaker for the foo application.
Does it mean that every public functions from foo should spawn a process under it's supervisor?
Thanks,
Bastien
Not necessarily, for two reasons:
The foo application will have two kinds of functions: those that require the worker processes to be running, and those that don't (most likely pure functions). If the application is stopped, obviously the former will fail when called, while the latter will still work. As per Erlang's "let it crash" philosophy, this is just another error condition that the web server needs to handle (or not handle). If the pure functions still work, there is no reason to prohibit the web server from calling them: it means that a greater portion of the system is functional.
In an Erlang node, stopping an application is not something you'd normally do. An Erlang application declares dependencies, that is, applications that need to be running for it to function correctly. You'll notice that if you try to start an application before its dependencies, it will refuse to start. While it's possible to stop applications manually, this means that the state of the node is no longer in accordance with the assumptions of the application model. When building a "release" consisting of a set of Erlang applications, normally they would all be started as permanent applications, meaning that if any one application crashes, the entire Erlang node would exit, in order not to violate this assumption.
So far, I know that when I start my Elixir application, a bunch of dependent applications also get started.
Are these dependent applications started inside my app supervision tree somehow?
What happens if a dependent application crashes? Is it restarted?
I guess that Elixir works like Erlang for application.
In Erlang each application have an independent supervision tree
If an application crashes, this means that the topmost supervisor did crash, and that all the restart strategy failed. There is few chance that simply adding a new layer of supervision will solve the problem.
it is possible to start all the dependencies using application:ensure_all_started(Application[,StartType]), StartType can be either
temporary : (default value): nothing occurs if a temporary application stops for any reason
permanent : all other applications terminate if a permanent application stops for any reason
transient : all other applications terminate is a transient application stops for any reason but normal
it is also possible to call application:ensure_started(Application[,StartType]) for each dependencies. Note that in both cases, the StartType only controls the effect of one application termination on the others, but there is no restart strategy applied.
it is possible to know which applications are running using application:which_applications()
In Erlang VM all applications start as children application_master.
Every application has StartType which can be one of temporary, transient and permanent.
A permanent and in some cases transient application crash will affect on entire Erlang VM (VM will crash and crash.dump file will created).
According to the Elixir application module you can set type of your dependencies in start/2.
1> os:cmd("ping google.com").
When above code was executed, there are two process are created, one is erlang process and one is system level process.
Is there any lib for erlang that we can monitor the system level process "ping google.com"?
Using the erlexec application to run OS processes gives you a lot more control over
those processes. You can send signals to processes (e.g. to stop it), setup Erlang monitors for OS processes and you get the status code when the OS process terminates (os:cmd doesn't give you that).
Take a look at the erlexec documentation.
I've set up a simple test-case at https://github.com/bvdeenen/otp_super_nukes_all that shows that an otp application:stop() actually kills all spawned processes by its children, even the ones that are not linked.
The test-case consists of one gen_server (registered as par) spawning a plain erlang process (registered as par_worker) and a gen_server (registered as reg_child), which also spawns a plain erlang process (registered as child_worker). Calling application:stop(test_app) does a normal termination on the 'par' gen_server, but an exit(kill) on all others!
Is this nominal behaviour? If so, where is it documented, and can I disable it? I want the processes I spawn from my gen_server (not link), to stay alive when the application terminates.
Thanks
Bart van Deenen
The application manual says (for the stop/1 function):
Last, the application master itself terminates. Note that all processes with the
application master as group leader, i.e. processes spawned from a process belonging
to the application, thus are terminated as well.
So I guess you cant modify this behavior.
EDIT: You might be able to change the group_leader of the started process with group_leader(GroupLeader, Pid) -> true (see: http://www.erlang.org/doc/man/erlang.html#group_leader-2). Changing the group_leader might allow you to avoid killing your process when the application ends.
I made that mistakes too, and found out it must happen.
If parent process dies, all children process dies no matter what it is registered or not.
If this does not happen, we have to track all up-and-running processes and figure out which is orphaned and which is not. you can guess how difficult it would be. You can think of unix ppid and pid. if you kill ppid, all children dies too. This, I think this must happen.
If you want to have processes independent from your application, you can send a messageto other application to start processes.
other_application_module:start_process(ProcessInfo).
I recently ran into a bug where an entire Erlang application died, yielding a log message that looked like this:
=INFO REPORT==== 11-Jun-2010::11:07:25 ===
application: myapp
exited: shutdown
type: temporary
I have no idea what triggered this shutdown, but the real problem I have is that it didn't restart itself. Instead, the now-empty Erlang VM just sat there doing nothing.
Now, from the research I've done, it looks like there are other "start types" you can give an application: 'transient' and 'permanent'.
If I start a Supervisor within an application, I can tell it to make a particular process transient or permanent, and it will automatically restart it for me. However, according to the documentation, if I make an application transient or permanent, it doesn't restart it when it dies, but rather it kills all the other applications as well.
What I really want to do is somehow tell the Erlang VM that a particular application should always be running, and if it goes down, restart it. Is this possible to do?
(I'm not talking about implementing a supervisor on top of my application, because then it's a catch 22: what if my supervisor process crashes? I'm looking for some sort of API or setting that I can use to have Erlang monitor and restart my application for me.)
Thanks!
You should be able to fix this in the top-level supervisor: set the restart strategy to allow one million restarts every second, and the application should never crash. Something like:
init(_Args) ->
{ok, {{one_for_one, 1000000, 1},
[{ch3, {ch3, start_link, []},
permanent, brutal_kill, worker, [ch3]}]}}.
(Example adapted from the OTP Design Principles User Guide.)
You can use heart to restart the entire VM if it goes down, then use a permanent application type to make sure that the VM exits when your application exits.
Ultimately you need something above your application that you need to trust, whether it is a supervisor process, the erlang VM, or some shell script you wrote - it will always be a problem if that happens to fail also.
Use Monit, then setup your application to terminate by using a supervisor for the whole application with a reasonable restart frequency. If the application terminates, the VM terminates, and monit restarts everything.
I could never get Heart to be reliable enough, as it only restarts the VM once, and it doesn't deal well with a kill -9 of the erlang VM.