Erlang: Who supervises the supervisor? - erlang

In all Erlang supervisor examples I have seen yet, there usually is a "master" supervisor who supervises the whole tree (or at least is the root node in the supervisor tree). What if the "master"-supervisor breaks? How should the "master"-supervisor be supervised?? any typical pattern?

The top supervisor is started in your application start/2 callback using start_link, this means that it links with the application process. If the application process receives an exit signal from the top supervisor dying it does one of two things:
If the application is started as an permanent application the entire node i terminated (and maybe restarted using HEART).
If the application is started as temporary the application stops running, no restart attempts will be made.

Typically Supervisor is set to "only" supervise other processes. Which mens there is no user written code which is executed by Supervisor - so it very unlikely to crash.
Of course, this cannot be enforced ... So typical pattern is to not have any application specific logic in Supervisor ... It should only Supervise - and do nothing else.

Good question. I have to concur that all of the examples and tutorials mostly ignore the issue - even if occasionally someone mentions the issue (without providing an example solution):
If you want reliability, use at least two computers, and then make them supervise each other. How to actually implement that with OTP is (with the current state of documentation and tutorials), however, appears to be somewhere between well hidden and secret.

Related

Erlang applications design (how to short-circuit)

I am having an existential question about how I design my erlang applications:
I usually create an application, which starts a supervisor and some workers.
Aside from the supervision tree, I have modules with functions (duh).
I also have a web API that calls functions from applications' modules.
When I stop my application (application:stop(foo).), the webserver can still call foo's functions.
I find it "not idiomatic" to not be able to have a proper circuit-breaker for the foo application.
Does it mean that every public functions from foo should spawn a process under it's supervisor?
Thanks,
Bastien
Not necessarily, for two reasons:
The foo application will have two kinds of functions: those that require the worker processes to be running, and those that don't (most likely pure functions). If the application is stopped, obviously the former will fail when called, while the latter will still work. As per Erlang's "let it crash" philosophy, this is just another error condition that the web server needs to handle (or not handle). If the pure functions still work, there is no reason to prohibit the web server from calling them: it means that a greater portion of the system is functional.
In an Erlang node, stopping an application is not something you'd normally do. An Erlang application declares dependencies, that is, applications that need to be running for it to function correctly. You'll notice that if you try to start an application before its dependencies, it will refuse to start. While it's possible to stop applications manually, this means that the state of the node is no longer in accordance with the assumptions of the application model. When building a "release" consisting of a set of Erlang applications, normally they would all be started as permanent applications, meaning that if any one application crashes, the entire Erlang node would exit, in order not to violate this assumption.

Erlang supervisor processes

I have been learning Erlang intensively, and after finishing 'Programming Erlang' from Joe Armstrong, there is one thing that I keep coming back to.
In my mind a Supervisor spawns One process per child handler. So each declared gen_server type handler will run as a separate process.
What happens if you are building a tiny web server and you want each requests to be its own process. Do you still conform to OTP principles and use a gen_server somehow (how ?), or do you create your own behaviour?
How does Cowboy handle this for eg. ? Does it still use gen_server ?
tl;dr: I find that trying to figure out the "correct" supervision structure a the beginning of a project is a form of premature optimization.
The "right" way to design your supervision tree depends on what the worker parts of your application are doing. In the case of a web server I would probably first explore something along the lines of:
top supervisor (singular)
data service supervisor (one per service type)
worker pool (all workers under the service sup)
client connection supervisor (one)
connection worker pool (or one per connection, have to play with it to decide)
logical supervisor (as appropriate -- massive variance here, depending on problem domain)
workers or supervisors (as appropriate -- have to explore/know the problem domain to have any idea how this should be structured)
So that's several workers per supervisor type at the lower level. I haven't used Cowboy so I don't know how it is organized. The point I'm trying to make is that while the mechanics of handling data services serving web pages are relatively trivial, the part of the system that actually does the core problem-solving work might not be and this is going to dictate everything interesting about the system.
It is a bad thing to have your problem-solving bits mixed in the same module as your web-displaying or connection handling bits. Ideally you should be able to use the same logic units in a native application, a web application and a networked service without any changes.
Ultimately the answer to whether you should have 1:1 supervisors to workers or 1:n depends on what you're doing and what restart strategy gives you the best balance among recovery to a known consistent state, latency felt by the user, and resource usage.
One of my favorite things about Erlang is that I can start with a naive supervisor structure like the one above, play with it until I see where its not so good, and rather easily switch things around and experiment with alternatives without fundamentally altering my system much. (The same goes for playing with alternative data representations if you write proper abstractions around them.) So first, get something that works in testing. Then load it up and see if you can break it. Then start worrying about the details, after you understand where the problems actually are.
It is a common pattern to spawn one server per client in erlang, You will then use a supervisor using the simple_one_to_one strategy for the children servers. This allows to ask the server to start a server on_demand. Generally this is used when you don't know how many processes you will need, and when the processes are independent (a crash of one process should not impacts the other).
There is a very good information in the site learningyousomeerlang.com (LYSE supervisor chapter). the whole site is worth to read.

Erlang supervision and applications

I have number of supervised components that can stand alone as separate applications. But I would like to cascade them such that a call or event in a worker in one component starts the next component down in an inverted tree-like structure.
1) Can I package each of these components as separate applications?
2) If so,how do I write the calling code to start the child application?
3) Or do I need to do something else altogether and,if so, what?
Note: I'm still working on mastery of supervision trees. The chain of events following application:start(Mod) is still not burned well into my head.
Many thanks,
LRP
Supervision trees and applications are complex Erlang/OTP concepts. They are both documented in OTP Design Principles User's Guide and in particular in:
chapter 5: Supervisor behaviour
chapter 7: Applications.
Supervision trees are not dependency trees and should not be designed as such. Instead, a supervision tree should be defined based on the desired crash behavior, as well as the desired start order. As a reminder, every process running long enough must be in the supervision tree.
An application is a reusable component that can be started and stopped. Applications can depend on other applications. However, applications are meant to be started at boot-time, and not when an event occurs.
Processes can be started when a given event occurs. If such processes shall be supervised, simply call supervisor:start_child/2 on its supervisor when the event occurs. This will start the process and it will be inserted in the supervision tree. You will typically use a Simple-one-for-one supervisor which will initially have no child.
Consequently:
You can package components as separate applications. In this case, you will declare the dependencies of applications in each application's app(4) file. Applications could then only be started in the proper order, either with a boot script or interactively with application:start/1.
You can package all your components in a single application and have worker processes starting other worker processes with supervisor:start_child/2.
You can package components as separate applications and have worker processes in one application starting processes in another application. In this case, the best would be to define a module in the target application that will call supervisor:start_child/2 itself, as applications should have clean APIs.
When you have worker processes (parents) starting other worker processes (children), you probably will link those processes. Linking is achieved with link/1. Links are symmetric and are usually established from the parent since the parent knows the pid of the child. If the parent process exits abnormally, the child will be terminated, and reciprocally.
Links are the most common way to handle crashes, for example a child shall be terminated if the parent is no longer there. Links are actually the foundation of OTP supervision. Adding links between worker processes reveals that designing supervision trees is actually difficult. Indeed, with links, you will have both processes terminating if one crashes, and yet, you probably do not want the child process to be restarted by the supervisor, as a supervisor-restarted child process will not be known (or linked) to a supervisor-restarted parent process.
If the parent shall terminate when the child exits normally, then this is a totally different design. You can either have the child send a message to the parent (e.g. return a result) or the parent monitor the child.
Finally, the parent process can terminate a child process. If the child is supervised, use supervisor:terminate_child/2. Otherwise, you can simply send an exit signal to the child process. In either cases, you will need to unlink the child process to avoid an exit of the parent process.
Both links and monitors are documented in the Erlang Reference Manual User's Guide. Instead of monitors, you might be tempted to trap exits, something explained in the guide. However, the manual page for the function to achieve this (process_flag/2) specifically reads:
Application processes should normally not trap exits.
This is typical OTP design wisdom, spread here and there in the documentation. Use monitors or simple messages instead.

Otp application:stop(..) kills all spawned processes, not just spawn_linked ones?

I've set up a simple test-case at https://github.com/bvdeenen/otp_super_nukes_all that shows that an otp application:stop() actually kills all spawned processes by its children, even the ones that are not linked.
The test-case consists of one gen_server (registered as par) spawning a plain erlang process (registered as par_worker) and a gen_server (registered as reg_child), which also spawns a plain erlang process (registered as child_worker). Calling application:stop(test_app) does a normal termination on the 'par' gen_server, but an exit(kill) on all others!
Is this nominal behaviour? If so, where is it documented, and can I disable it? I want the processes I spawn from my gen_server (not link), to stay alive when the application terminates.
Thanks
Bart van Deenen
The application manual says (for the stop/1 function):
Last, the application master itself terminates. Note that all processes with the
application master as group leader, i.e. processes spawned from a process belonging
to the application, thus are terminated as well.
So I guess you cant modify this behavior.
EDIT: You might be able to change the group_leader of the started process with group_leader(GroupLeader, Pid) -> true (see: http://www.erlang.org/doc/man/erlang.html#group_leader-2). Changing the group_leader might allow you to avoid killing your process when the application ends.
I made that mistakes too, and found out it must happen.
If parent process dies, all children process dies no matter what it is registered or not.
If this does not happen, we have to track all up-and-running processes and figure out which is orphaned and which is not. you can guess how difficult it would be. You can think of unix ppid and pid. if you kill ppid, all children dies too. This, I think this must happen.
If you want to have processes independent from your application, you can send a messageto other application to start processes.
other_application_module:start_process(ProcessInfo).

mochiweb and gen_server

[This will only make sense if you've seen Kevin Smith's 'Erlang in Practice' screencasts]
I'm an Erlang noob trying to build a simple Erlang/OTP system with embedded webserver [mochiweb].
I've walked through the EIP screencasts, and I've toyed with simple mochiweb examples created using the new_mochiweb.erl script.
I'm trying to figure out how the webserver should relate to the gen_server modules. In the EIP examples [Ch7], the author creates a web_server.erl gen_server process and links the mochiweb_http process to it. However in a mochiweb project, the mochiweb_http process seems to be 'standalone'; it doesn't seem to be embedded in a separate gen_server process.
My question is, should one of these patterns be preferred over the other ? If so, why ? Or doesn't it matter ?
Thanks in advance.
You link processes to the supervisor hierarchy of your application for two reasons: 1) to be able to restart your worker processes if they crash, and 2) to be able to kill all your processes when you stop the application.
As the previous answer says, 1) is not the case for http requests handling processes. However, 2) is valid: if you let your processes alone, you can't guarantee that all your processes will be cleared from the VM after stopping your application (think of processes stuck in endless loops, waiting in receives, etc...).
The reason to embed a process in a supervision tree is so that you can restart it if it fails.
A process that handles an HTTP request is responding to an event generated externally - in a browser. It is not possible to restart it - that is the prerogative of the person running the browser - therefore it is not necessary to run it under OTP - you can just spawn it without supervision.

Resources