I want to be able to spawn a lot of processes that process data and fit them into a supervision tree. However all default behaviours, namely gen_server, gen_fsm, and gen_event, are event-driven. They have to receive messages to do stuff. What I need are just processes that process data, and in case they terminate abnormally, they should be restarted by their supervisor. What's the best way to go about doing this?
Yes, the standard behaviours all function as servers in that they sit and wait for requests before they do something. However, OTP is open in the sense that it provides the tools you need to implement processes which are not behaviours but which fit into the supervision trees and do "the right thing". For a description on what needs to be done and how to do it see the section on Special processes in the Erlang documentation.
This is really not surprising as all of OTP behaviours are implemented in Erlang so all the "tools" are there in the libraries.
Related
So we have generic part and the specific part thanks to erlang/OTP behaviours which greatly simplifies our code and gives us a common skeleton that everybody understands.
I am an Erlang newbie and trying to get my head around concepts of OTP.
But amongst
gen_server
gen_fsm
gen_event
supervisor
which behaviour is suitable for what kind of application.
The reason why I am asking this question is because after going through some verbose theory,I am still unsure about which is more suited for a XYZ application.
Is there any criteria or a Rule of thumb,or is it just the programmers instincts that matter in this case?
Any help?
Remark, it is a very general question that doesn't fit well to the usage of this community.
Supervisor are here to control the way your application starts and evolves in time.
It starts processes in the right order with a verification that everything is going fine, manage the restart strategy in case of failure, pool of processes, supervision tree... The advantages are that it helps you to follow the "let it crash" rule in your application, and it concentrates in a few line of code the complex task to coordinate all your processes.
I use gen_event mainly for logging purposes. You can use this behavior to connect on demand different handlers depending on your target. In one of my application, I have by default a graphical logger that allows to check the behavior at a glance, and a textual file handler that allows deeper and post mortem analysis.
Gen_server is the basic brick of all applications. It allows to keep a state, react on call, cast and even simple messages. It manages the code changes and provides features for start and stop. Most of my modules are gen_servers, you can see it as an actor which is responsible to execute a single task.
Gen_fsm brings to gen server the notion of a current state with a specific behavior, and transition from state to state. It can be used for example in a game to control the evolution of the game (configuration, waiting for players, play, change level...)
When you create an application you will identify the different tasks yo have to do and assign them to either gen_server either gen_fsm. You will add gen_event to allow publish/subscribe mechanism. Then you will have to group all processes considering their dependency and error recovering. This will give you the supervision tree and restart strategy.
I use following rules.
If you have some long-living entity that serves multiple clients in request-response manner - you use gen_server.
If this long-living entity should support more complex protocol that just request/response, there is good idea to use gen_fsm. I like use gen_fsm for network protocol implementation, for example.
If you need some abstract event bus, where any process can publish messages, and any client may want receive this messages, you need gen_event.
If you have some proceeses which must be restarted together (for example, if one of them fails), you should put this processes under supervisor, which ties this processes together.
I have been learning Erlang intensively, and after finishing 'Programming Erlang' from Joe Armstrong, there is one thing that I keep coming back to.
In my mind a Supervisor spawns One process per child handler. So each declared gen_server type handler will run as a separate process.
What happens if you are building a tiny web server and you want each requests to be its own process. Do you still conform to OTP principles and use a gen_server somehow (how ?), or do you create your own behaviour?
How does Cowboy handle this for eg. ? Does it still use gen_server ?
tl;dr: I find that trying to figure out the "correct" supervision structure a the beginning of a project is a form of premature optimization.
The "right" way to design your supervision tree depends on what the worker parts of your application are doing. In the case of a web server I would probably first explore something along the lines of:
top supervisor (singular)
data service supervisor (one per service type)
worker pool (all workers under the service sup)
client connection supervisor (one)
connection worker pool (or one per connection, have to play with it to decide)
logical supervisor (as appropriate -- massive variance here, depending on problem domain)
workers or supervisors (as appropriate -- have to explore/know the problem domain to have any idea how this should be structured)
So that's several workers per supervisor type at the lower level. I haven't used Cowboy so I don't know how it is organized. The point I'm trying to make is that while the mechanics of handling data services serving web pages are relatively trivial, the part of the system that actually does the core problem-solving work might not be and this is going to dictate everything interesting about the system.
It is a bad thing to have your problem-solving bits mixed in the same module as your web-displaying or connection handling bits. Ideally you should be able to use the same logic units in a native application, a web application and a networked service without any changes.
Ultimately the answer to whether you should have 1:1 supervisors to workers or 1:n depends on what you're doing and what restart strategy gives you the best balance among recovery to a known consistent state, latency felt by the user, and resource usage.
One of my favorite things about Erlang is that I can start with a naive supervisor structure like the one above, play with it until I see where its not so good, and rather easily switch things around and experiment with alternatives without fundamentally altering my system much. (The same goes for playing with alternative data representations if you write proper abstractions around them.) So first, get something that works in testing. Then load it up and see if you can break it. Then start worrying about the details, after you understand where the problems actually are.
It is a common pattern to spawn one server per client in erlang, You will then use a supervisor using the simple_one_to_one strategy for the children servers. This allows to ask the server to start a server on_demand. Generally this is used when you don't know how many processes you will need, and when the processes are independent (a crash of one process should not impacts the other).
There is a very good information in the site learningyousomeerlang.com (LYSE supervisor chapter). the whole site is worth to read.
I have number of supervised components that can stand alone as separate applications. But I would like to cascade them such that a call or event in a worker in one component starts the next component down in an inverted tree-like structure.
1) Can I package each of these components as separate applications?
2) If so,how do I write the calling code to start the child application?
3) Or do I need to do something else altogether and,if so, what?
Note: I'm still working on mastery of supervision trees. The chain of events following application:start(Mod) is still not burned well into my head.
Many thanks,
LRP
Supervision trees and applications are complex Erlang/OTP concepts. They are both documented in OTP Design Principles User's Guide and in particular in:
chapter 5: Supervisor behaviour
chapter 7: Applications.
Supervision trees are not dependency trees and should not be designed as such. Instead, a supervision tree should be defined based on the desired crash behavior, as well as the desired start order. As a reminder, every process running long enough must be in the supervision tree.
An application is a reusable component that can be started and stopped. Applications can depend on other applications. However, applications are meant to be started at boot-time, and not when an event occurs.
Processes can be started when a given event occurs. If such processes shall be supervised, simply call supervisor:start_child/2 on its supervisor when the event occurs. This will start the process and it will be inserted in the supervision tree. You will typically use a Simple-one-for-one supervisor which will initially have no child.
Consequently:
You can package components as separate applications. In this case, you will declare the dependencies of applications in each application's app(4) file. Applications could then only be started in the proper order, either with a boot script or interactively with application:start/1.
You can package all your components in a single application and have worker processes starting other worker processes with supervisor:start_child/2.
You can package components as separate applications and have worker processes in one application starting processes in another application. In this case, the best would be to define a module in the target application that will call supervisor:start_child/2 itself, as applications should have clean APIs.
When you have worker processes (parents) starting other worker processes (children), you probably will link those processes. Linking is achieved with link/1. Links are symmetric and are usually established from the parent since the parent knows the pid of the child. If the parent process exits abnormally, the child will be terminated, and reciprocally.
Links are the most common way to handle crashes, for example a child shall be terminated if the parent is no longer there. Links are actually the foundation of OTP supervision. Adding links between worker processes reveals that designing supervision trees is actually difficult. Indeed, with links, you will have both processes terminating if one crashes, and yet, you probably do not want the child process to be restarted by the supervisor, as a supervisor-restarted child process will not be known (or linked) to a supervisor-restarted parent process.
If the parent shall terminate when the child exits normally, then this is a totally different design. You can either have the child send a message to the parent (e.g. return a result) or the parent monitor the child.
Finally, the parent process can terminate a child process. If the child is supervised, use supervisor:terminate_child/2. Otherwise, you can simply send an exit signal to the child process. In either cases, you will need to unlink the child process to avoid an exit of the parent process.
Both links and monitors are documented in the Erlang Reference Manual User's Guide. Instead of monitors, you might be tempted to trap exits, something explained in the guide. However, the manual page for the function to achieve this (process_flag/2) specifically reads:
Application processes should normally not trap exits.
This is typical OTP design wisdom, spread here and there in the documentation. Use monitors or simple messages instead.
My Erlang app will create up to 1'000'000 processes. Each process will be a gen_server. From time to time each process will get some messages.
I am looking for a robust process keeper for my Erlang app. Any ideas what is a best match for my needs?
Additional info:
Read here why build-in process registry is not good enough for tasks like mine: https://github.com/uwiger/gproc/blob/master/doc/erlang07-wiger.pdf
Do you need a better process registry? If so, take a look at gproc. It's a powerful alternative to standard OTP registered processes that, among others, suffers from the hard one million tuple limit of the Erlang VM.
Does anybody knows if there is a sort of 'load-balancer' in the erlang standard library? I mean, if I have some really simple operations on a really large set of data, the overhead of constructing a process for every item will be larger than perform the operation sequentially. But if I can balance the work in the 'right number' of process, it will perform better, so I'm basically asking if there is an easy way to accomplish this task.
By the way, does anybody knows if an OTP application does some kind of balance load? I mean, in an OTP application there is the concept of a "worker process" (like a java-ish thread worker)?
See modules pg2 and pool.
pg2 implements quite simple distributed process pool. pg2:get_closest_pid/1 returns "closest" pid, i.e. random local process if available, otherwise random remote process.
pool implements load balancing between nodes started with module slave.
The plists module probably does what you want. It is basically a parallel implementation of the lists module, design to be used as a drop-in replacement. However, you can also control how it parallelizes its operations, for example by defining how many worker processes should be spawned etc.
You probably would do it by calculating some number of workers depending on the length of the list or the load of the system etc.
From the website:
plists is a drop-in replacement for
the Erlang module lists, making most
list operations parallel. It can
operate on each element in parallel,
for IO-bound operations, on sublists
in parallel, for taking advantage of
multi-core machines with CPU-bound
operations, and across erlang nodes,
for parallizing inside a cluster. It
handles errors and node failures. It
can be configured, tuned, and tweaked
to get optimal performance while
minimizing overhead.
There is no, in my view, usefull generic load-balancing tool in otp. And perhaps it only usefull to have one in specific cases. It is easy enough to implement one yourself. plists may be useful in the same cases. I do not believe in parallel-libraries as a substitute to the real thing. Amdahl will haunt you forever if you walk this path.
The right number of worker processes is equal to the number of schedulers. This may vary depending of what other work is done on the system. Use,
erlang:system_info(schedulers_online) -> NS
to get the number of schedulers.
The notion of overhead when flooding the system with an abundance of worker processes is somewhat faulty. There is overhead with new processes but not as much as with os-threads. The main overhead is message copying between processes, this can be alleviated with the use of binaries since only the reference to the binary is sent. With eterms the structure is first expanded then copied to the other process.
There is no way how to predict cost of work mechanically without measure it e.g do it. Some person must determine how to partition work for some class of tasks. In load balancer word I understand something very different than in your question.