Erlang supervision and applications

I have number of supervised components that can stand alone as separate applications. But I would like to cascade them such that a call or event in a worker in one component starts the next component down in an inverted tree-like structure.
1) Can I package each of these components as separate applications?
2) If so,how do I write the calling code to start the child application?
3) Or do I need to do something else altogether and,if so, what?
Note: I'm still working on mastery of supervision trees. The chain of events following application:start(Mod) is still not burned well into my head.
Supervision trees and applications are complex Erlang/OTP concepts. They are both documented in OTP Design Principles User's Guide and in particular in:
chapter 5: Supervisor behaviour
chapter 7: Applications.
Supervision trees are not dependency trees and should not be designed as such. Instead, a supervision tree should be defined based on the desired crash behavior, as well as the desired start order. As a reminder, every process running long enough must be in the supervision tree.
An application is a reusable component that can be started and stopped. Applications can depend on other applications. However, applications are meant to be started at boot-time, and not when an event occurs.
Processes can be started when a given event occurs. If such processes shall be supervised, simply call supervisor:start_child/2 on its supervisor when the event occurs. This will start the process and it will be inserted in the supervision tree. You will typically use a Simple-one-for-one supervisor which will initially have no child.
You can package components as separate applications. In this case, you will declare the dependencies of applications in each application's app(4) file. Applications could then only be started in the proper order, either with a boot script or interactively with application:start/1.
You can package all your components in a single application and have worker processes starting other worker processes with supervisor:start_child/2.
You can package components as separate applications and have worker processes in one application starting processes in another application. In this case, the best would be to define a module in the target application that will call supervisor:start_child/2 itself, as applications should have clean APIs.
When you have worker processes (parents) starting other worker processes (children), you probably will link those processes. Linking is achieved with link/1. Links are symmetric and are usually established from the parent since the parent knows the pid of the child. If the parent process exits abnormally, the child will be terminated, and reciprocally.
Links are the most common way to handle crashes, for example a child shall be terminated if the parent is no longer there. Links are actually the foundation of OTP supervision. Adding links between worker processes reveals that designing supervision trees is actually difficult. Indeed, with links, you will have both processes terminating if one crashes, and yet, you probably do not want the child process to be restarted by the supervisor, as a supervisor-restarted child process will not be known (or linked) to a supervisor-restarted parent process.
If the parent shall terminate when the child exits normally, then this is a totally different design. You can either have the child send a message to the parent (e.g. return a result) or the parent monitor the child.
Finally, the parent process can terminate a child process. If the child is supervised, use supervisor:terminate_child/2. Otherwise, you can simply send an exit signal to the child process. In either cases, you will need to unlink the child process to avoid an exit of the parent process.
Both links and monitors are documented in the Erlang Reference Manual User's Guide. Instead of monitors, you might be tempted to trap exits, something explained in the guide. However, the manual page for the function to achieve this (process_flag/2) specifically reads:
Application processes should normally not trap exits.
This is typical OTP design wisdom, spread here and there in the documentation. Use monitors or simple messages instead.


Erlang applications design (how to short-circuit)

I am having an existential question about how I design my erlang applications:
I usually create an application, which starts a supervisor and some workers.
Aside from the supervision tree, I have modules with functions (duh).
I also have a web API that calls functions from applications' modules.
When I stop my application (application:stop(foo).), the webserver can still call foo's functions.
I find it "not idiomatic" to not be able to have a proper circuit-breaker for the foo application.
Does it mean that every public functions from foo should spawn a process under it's supervisor?
Not necessarily, for two reasons:
The foo application will have two kinds of functions: those that require the worker processes to be running, and those that don't (most likely pure functions). If the application is stopped, obviously the former will fail when called, while the latter will still work. As per Erlang's "let it crash" philosophy, this is just another error condition that the web server needs to handle (or not handle). If the pure functions still work, there is no reason to prohibit the web server from calling them: it means that a greater portion of the system is functional.
In an Erlang node, stopping an application is not something you'd normally do. An Erlang application declares dependencies, that is, applications that need to be running for it to function correctly. You'll notice that if you try to start an application before its dependencies, it will refuse to start. While it's possible to stop applications manually, this means that the state of the node is no longer in accordance with the assumptions of the application model. When building a "release" consisting of a set of Erlang applications, normally they would all be started as permanent applications, meaning that if any one application crashes, the entire Erlang node would exit, in order not to violate this assumption.

In Erlang, what's the difference between gen_server:start() and gen_server:start_link()?

Can someone explain what's the difference between gen_server:start() and gen_server:start_link()?
I've been told that it's something about multi threading stuff.
If my gen_server is called from multiple threads, will it execute them all at once ? Or will it create concurrency between these threads?
Both functions start new gen_server instances as children of the calling process, but they differ in that the gen_server:start_link/3,4 atomically starts a gen_server child and links it to its parent process. Linking means that if the child dies, the parent will by default also die. Supervisors are parent processes that use links to take specific actions when their child processes exit abnormally, typically restarting them.
Other than the linking involved in the gen_server:start_link case, there are no multi-process aspects involved in these calls. Regardless of whether you use gen_server:start or gen_server:start_link to start a new gen_server, the new process has a single message queue, and it receives and processes those messages one at a time. There is nothing about gen_server:start_link that causes the new gen_server process to behave or perform differently than it would if started with gen_server:start.
When you use gen_server:start_link new process becomes "child" of calling process - it's part of supervision tree. It allows calling process to be notified if gen_server process dies.
Using gen_server:start will spawn process outside of supervision tree.
Nice description of supervision in Erlang is here:

which behaviour should be implemented?

So we have generic part and the specific part thanks to erlang/OTP behaviours which greatly simplifies our code and gives us a common skeleton that everybody understands.
I am an Erlang newbie and trying to get my head around concepts of OTP.
But amongst
which behaviour is suitable for what kind of application.
The reason why I am asking this question is because after going through some verbose theory,I am still unsure about which is more suited for a XYZ application.
Is there any criteria or a Rule of thumb,or is it just the programmers instincts that matter in this case?
Any help?
Remark, it is a very general question that doesn't fit well to the usage of this community.
Supervisor are here to control the way your application starts and evolves in time.
It starts processes in the right order with a verification that everything is going fine, manage the restart strategy in case of failure, pool of processes, supervision tree... The advantages are that it helps you to follow the "let it crash" rule in your application, and it concentrates in a few line of code the complex task to coordinate all your processes.
I use gen_event mainly for logging purposes. You can use this behavior to connect on demand different handlers depending on your target. In one of my application, I have by default a graphical logger that allows to check the behavior at a glance, and a textual file handler that allows deeper and post mortem analysis.
Gen_server is the basic brick of all applications. It allows to keep a state, react on call, cast and even simple messages. It manages the code changes and provides features for start and stop. Most of my modules are gen_servers, you can see it as an actor which is responsible to execute a single task.
Gen_fsm brings to gen server the notion of a current state with a specific behavior, and transition from state to state. It can be used for example in a game to control the evolution of the game (configuration, waiting for players, play, change level...)
When you create an application you will identify the different tasks yo have to do and assign them to either gen_server either gen_fsm. You will add gen_event to allow publish/subscribe mechanism. Then you will have to group all processes considering their dependency and error recovering. This will give you the supervision tree and restart strategy.
I use following rules.
If you have some long-living entity that serves multiple clients in request-response manner - you use gen_server.
If this long-living entity should support more complex protocol that just request/response, there is good idea to use gen_fsm. I like use gen_fsm for network protocol implementation, for example.
If you need some abstract event bus, where any process can publish messages, and any client may want receive this messages, you need gen_event.
If you have some proceeses which must be restarted together (for example, if one of them fails), you should put this processes under supervisor, which ties this processes together.

Simple_one_for_one application

I have a supervisor which starts simple_one_for_one children. Each child is in fact a supervisor which has its own tree. Each child is started with an unique ID, so I can distinguish them. Each gen_server is then started with start_link(Id), where:
-define(SERVER(Id), {global, {Id, ?MODULE}}).
start_link(Id) ->
gen_server:start_link(?SERVER(Id), ?MODULE, [Id], []).
So, each gen_server can easily be addresed with {global, {Id, module_name}}.
Now I'd like to make this child supervisor into application. So, my mother supervisor should start applications instead of supervisors. That should be straightforward, except one part: passing ID to an application. Starting supervisor with an ID is easy: supervisor:start_child(?SERVER, [Id]). How do I do it for application? How can I start several applications of the same name (so I can access the same .app file) with different ID (so I can start my children with supervisor:start_child(?SERVER, [Id]))?
If my question is not clear enough, here is my code. So, currently, es_simulator_dispatcher starts es_simulator_sup. I'd like to have this: es_simulator_dispatcher starts es_simulator_app which starts es_simulator_sup. That's all there is to it :-)
Applications don't run under anything else, they are a top-level abstraction. When you start an application with application:start/1 the application is started by the application controller which manages applications. Applications contain code and data, and maybe at runtime a supervision tree of processes doing the applications thing at runtime. Running multiple invocations of an application does not really make sense because of the nature of applications.
I would suggest reading OTP Design Principles User's Guide for a description of the components of OTP, how they relate and how they are intended to be used.
I don't think applications where meant for dynamic construction like you want. I'd make a single application, because in Erlang, applications are bundles of code more than they are bundles of running processes (you can say they are an artifact of compile-time moreso than of runtime).
Usually you feed configuration to an application through the built-in configuration system. That is, you use application:get_env(Key) to read something it should use. There is also an application:set_env(...) to feed specific configuration into one - but the preferred way is the config file on disk. This may or may not work in your case.
In some sense, what you are up to corresponds to creating 200 Apache configuration files and then spawn 200 Apache systems next to each other, rather than running a single one and then handle the multiple domains inside it.

Erlang: Who supervises the supervisor?

In all Erlang supervisor examples I have seen yet, there usually is a "master" supervisor who supervises the whole tree (or at least is the root node in the supervisor tree). What if the "master"-supervisor breaks? How should the "master"-supervisor be supervised?? any typical pattern?
The top supervisor is started in your application start/2 callback using start_link, this means that it links with the application process. If the application process receives an exit signal from the top supervisor dying it does one of two things:
If the application is started as an permanent application the entire node i terminated (and maybe restarted using HEART).
If the application is started as temporary the application stops running, no restart attempts will be made.
Typically Supervisor is set to "only" supervise other processes. Which mens there is no user written code which is executed by Supervisor - so it very unlikely to crash.
Of course, this cannot be enforced ... So typical pattern is to not have any application specific logic in Supervisor ... It should only Supervise - and do nothing else.
Good question. I have to concur that all of the examples and tutorials mostly ignore the issue - even if occasionally someone mentions the issue (without providing an example solution):
If you want reliability, use at least two computers, and then make them supervise each other. How to actually implement that with OTP is (with the current state of documentation and tutorials), however, appears to be somewhere between well hidden and secret.
