Erlang supervisor processes - erlang

I have been learning Erlang intensively, and after finishing 'Programming Erlang' from Joe Armstrong, there is one thing that I keep coming back to.
In my mind a Supervisor spawns One process per child handler. So each declared gen_server type handler will run as a separate process.
What happens if you are building a tiny web server and you want each requests to be its own process. Do you still conform to OTP principles and use a gen_server somehow (how ?), or do you create your own behaviour?
How does Cowboy handle this for eg. ? Does it still use gen_server ?

tl;dr: I find that trying to figure out the "correct" supervision structure a the beginning of a project is a form of premature optimization.
The "right" way to design your supervision tree depends on what the worker parts of your application are doing. In the case of a web server I would probably first explore something along the lines of:
top supervisor (singular)
data service supervisor (one per service type)
worker pool (all workers under the service sup)
client connection supervisor (one)
connection worker pool (or one per connection, have to play with it to decide)
logical supervisor (as appropriate -- massive variance here, depending on problem domain)
workers or supervisors (as appropriate -- have to explore/know the problem domain to have any idea how this should be structured)
So that's several workers per supervisor type at the lower level. I haven't used Cowboy so I don't know how it is organized. The point I'm trying to make is that while the mechanics of handling data services serving web pages are relatively trivial, the part of the system that actually does the core problem-solving work might not be and this is going to dictate everything interesting about the system.
It is a bad thing to have your problem-solving bits mixed in the same module as your web-displaying or connection handling bits. Ideally you should be able to use the same logic units in a native application, a web application and a networked service without any changes.
Ultimately the answer to whether you should have 1:1 supervisors to workers or 1:n depends on what you're doing and what restart strategy gives you the best balance among recovery to a known consistent state, latency felt by the user, and resource usage.
One of my favorite things about Erlang is that I can start with a naive supervisor structure like the one above, play with it until I see where its not so good, and rather easily switch things around and experiment with alternatives without fundamentally altering my system much. (The same goes for playing with alternative data representations if you write proper abstractions around them.) So first, get something that works in testing. Then load it up and see if you can break it. Then start worrying about the details, after you understand where the problems actually are.

It is a common pattern to spawn one server per client in erlang, You will then use a supervisor using the simple_one_to_one strategy for the children servers. This allows to ask the server to start a server on_demand. Generally this is used when you don't know how many processes you will need, and when the processes are independent (a crash of one process should not impacts the other).
There is a very good information in the site learningyousomeerlang.com (LYSE supervisor chapter). the whole site is worth to read.

Related

Cowboy Message Queue Design

I have a web service written in Cowboy and I am planning to use RabbitMQ as the DB layer. So my Cowboy service will be one of the producer which writes to the queue and the consumer writes to the database. There are couple more asynchronous tasks that will come from another service (not Cowboy).
Now the question is where these consumers should go. Should these be part of single erlang app or should I create separate Erlang app for all the consumers.
Any advice would be highly appreciated.
Since Erlang is not the exclusive producer, and since one can usually imagine consumers running without knowledge of the producers, having separate applications is not a bad idea at all. You can have multiple top-level applications in a single Erlang release (that's what the dependencies are, really), so you can always put all the code in the same repository (I usually have a top level apps/ directory for these), and if needed later on split them out to separate repos.
Having them as separate applications certainly makes deciding later on to distribute the application across multiple erlang nodes easier: just start the relevant producer applications s on some nodes, and the consumer application on others.
So while either way will probably work, separate apps is probably a cleaner design and keeps the door open for future expansion in a slightly nicer way.

What makes erlang scalable?

I am working on an article describing fundamentals of technologies used by scalable systems. I have worked on Erlang before in a self-learning excercise. I have gone through several articles but have not been able to answer the following questions:
What is in the implementation of Erlang that makes it scalable? What makes it able to run concurrent processes more efficiently than technologies like Java?
What is the relation between functional programming and parallelization? With the declarative syntax of Erlang, do we achieve run-time efficiency?
Does process state not make it heavy? If we have thousands of concurrent users and spawn and equal number of processes as gen_server or any other equivalent pattern, each process would maintain a state. With so many processes, will it not be a drain on the RAM?
If a process has to make DB operations and we spawn multiple instances of that process, eventually the DB will become a bottleneck. This happens even if we use traditional models like Apache-PHP. Almost every business application needs DB access. What then do we gain from using Erlang?
How does process restart help? A process crashes when something is wrong in its logic or in the data. OTP allows you to restart a process. If the logic or data does not change, why would the process not crash again and keep crashing always?
Most articles sing praises about Erlang citing its use in Facebook and Whatsapp. I salute Erlang for being scalable, but also want to technically justify its scalability.
Even if I find answers to these queries on an existing link, that will help.
Regards,
Yash
Shortly:
It's unmutable. You have no variables, only terms, tuples and atoms. Program execution can be divided by breakpoint at any place. Fully transactional model.
Processes are even lightweight than .NET threads and isolated.
It's made for communications. Millions of connections? Fully asynchronous? Maximum thread safety? Big cross-platform environment, which built only for one purpose — scale&communicate? It's all Ericsson language — first in this sphere.
You can choose some impersonators like F#, Scala/Akka, Haskell — they are trying to copy features from Erlang, but only Erlang born from and born for only one purpose — telecom.
Answers to other questions you can find on erlang.com and I'm suggesting you to visit handbook. Erlang built for other aims, so it's not for every task, and if you asking about awful things like php, Erlang will not be your language.
I'm no Erlang developer (yet) but from what I have read about it some of the features that makes it very scalable is that Erlang has its own lightweight processes that are using message passing to communicate with each other. Because of this there is no such thing as shared state and locking which is the case when using for example a multi threaded Java application.
Another difference compared to Java is that the Erlang VM does garbage collection on every little process that is running which does not take any time at all compared to Java which does garbage collection only per VM.
If you get problem with bottlenecks from database connection you could start by using a database pooling app running against maybe a replicated PostgreSQL cluster or if you still have bottlenecks use a multi replicated NoSQL setup with Mnesia, Riak or CouchDB.
I think process restarts can be very useful when you are experiencing rare bugs that only appear randomly and only when specific criteria is fulfilled. Bugs that cause the application to crash as soon as you restart the app should optimally be fixed or taken care of with a circuit breaker so that it does not spread further.
Here is one way process restart helps. By not having to deal with all possible error cases. Say you have a program that divides numbers. Some guy enters a zero to divide by. Instead of checking for that possible error (and tons more), just code the "happy case" and let process crash when he enters 3/0. It just restarts, and he can figure out what he did wrong.
You an extend this into an infinite number of situations (attempting to read from a non-existent file because the user misspelled it, etc).
The big reason for process restart being valuable is that not every error happens every time, and checking that it worked is verbose.
Error handling is verbose typically, so writing it interspersed with the logic handling doing a task can make it harder to understand the code. Moving that logic outside of the task allows you to more clearly distinguish between "doing things" code, and "it broke" code. You just let the thing that had a problem fail, and handle it as needed by a supervising party.
Since most errors don't mean that the entire program must stop, only that that particular thing isn't working right, by just restarting the part that broke, you can keep operating in a state of degraded functionality, instead of being down, while you repair the problem.
It should also be noted that the failure recovery is bounded. You have to lay out the limits for how much failure in a certain period of time is too much. If you exceed that limit, the failure propagates to another level of supervision. Each restart includes doing any needed process initialization, which is sometimes enough to fix the problem. For example, in dev, I've accidentally deleted a database file associated with a process. The crashes cascaded up to the level where the file was first created, at which point the problem rectified itself, and everything carried on.

which behaviour should be implemented?

So we have generic part and the specific part thanks to erlang/OTP behaviours which greatly simplifies our code and gives us a common skeleton that everybody understands.
I am an Erlang newbie and trying to get my head around concepts of OTP.
But amongst
gen_server
gen_fsm
gen_event
supervisor
which behaviour is suitable for what kind of application.
The reason why I am asking this question is because after going through some verbose theory,I am still unsure about which is more suited for a XYZ application.
Is there any criteria or a Rule of thumb,or is it just the programmers instincts that matter in this case?
Any help?
Remark, it is a very general question that doesn't fit well to the usage of this community.
Supervisor are here to control the way your application starts and evolves in time.
It starts processes in the right order with a verification that everything is going fine, manage the restart strategy in case of failure, pool of processes, supervision tree... The advantages are that it helps you to follow the "let it crash" rule in your application, and it concentrates in a few line of code the complex task to coordinate all your processes.
I use gen_event mainly for logging purposes. You can use this behavior to connect on demand different handlers depending on your target. In one of my application, I have by default a graphical logger that allows to check the behavior at a glance, and a textual file handler that allows deeper and post mortem analysis.
Gen_server is the basic brick of all applications. It allows to keep a state, react on call, cast and even simple messages. It manages the code changes and provides features for start and stop. Most of my modules are gen_servers, you can see it as an actor which is responsible to execute a single task.
Gen_fsm brings to gen server the notion of a current state with a specific behavior, and transition from state to state. It can be used for example in a game to control the evolution of the game (configuration, waiting for players, play, change level...)
When you create an application you will identify the different tasks yo have to do and assign them to either gen_server either gen_fsm. You will add gen_event to allow publish/subscribe mechanism. Then you will have to group all processes considering their dependency and error recovering. This will give you the supervision tree and restart strategy.
I use following rules.
If you have some long-living entity that serves multiple clients in request-response manner - you use gen_server.
If this long-living entity should support more complex protocol that just request/response, there is good idea to use gen_fsm. I like use gen_fsm for network protocol implementation, for example.
If you need some abstract event bus, where any process can publish messages, and any client may want receive this messages, you need gen_event.
If you have some proceeses which must be restarted together (for example, if one of them fails), you should put this processes under supervisor, which ties this processes together.

Erlang OTP based application - architecture ideas

I'm trying to write an Erlang application (OTP) that would parse a list of users and then launch workers that will work 24X7 to collect user-data (using three different APIs) from remote servers and store it in ets.
What would be the ideal architecture for this kind of application. Do I launch a bunch of workers - one for each user (assuming small number users)? What will happen if number of users increases very rapidly?
Also, to call different APIs I need to put up a Timer mechanism in the worker process.
Any hint will be really appreciated.
Spawning new process for each user is not a such bad idea. There are http servers that do this for each connection, and they doing quite fine.
First of all cost of creating new process is minimal. And cost of maintaining processes is even smaller. If one of the has nothing to do, it won't do anything; there is none (almost) runtime overhead from inactive processes, which in the end means that you are doing only the work you have to do (this is in fact the source of Erlang systems reactivity).
Some issue might be memory usage. Each process has it's own memory stack, and in use-case when they actually do not need to store any internal data, you might be allocating some unnecessary memory. But this also could be modified (even during runtime), and in most cases such memory will be garbage collected.
Actually I would not worry about such things too soon. Issues you might encounter might depend on many things, mostly amount of outside data or user activity, and you can not really design this. Most probably you won't encounter any of them for quite some time. There's no need for premature optimization, especially if you could bind yourself to design that would slow down rest of your development process. In Erlang, with processes being main source of abstraction you can easily swap this process-per-user with pool-of-workers, and ets with external service. But only if you really need it.
What's most important is fact that representing "user" as process would be closest to problem domain. "Users" are independent entities, and deserve separate processes (they have their own state, and they can act or react independent to each other). It is quite similar to using Objects and Classes in other languages (it is over-simplification, but it should get you going).
If you were writing this in Python or C++ would you worry about how many objects you were creating? Only in extreme cases. In Erlang the same general rule applies for processes. Don't worry about how many you are creating.
As for architecture, the only element that is an architectural issue in your question is whether you should design a fixed worker pool or a 1-for-1 worker pool. The shape of the supervision tree would be an outcome of whichever way you choose.
If you are scraping data your real bottleneck isn't going to be how many processes you have, it will be how many network requests you are able to make per second on each API you are trying to access. You will almost certainly get throttled.
(A few months ago I wrote a test demonstration of a very similar system to what you are describing. The limiting factor was API request limits from providers like fb, YouTube, g+, Yahoo, not number of processes.)
As always with Erlang, write some system first, and then benchmark it for real before worrying about performance. You will usually find that performance isn't an issue, and the times that it is you will discover that it is much easier to optimize one small part of an existing system than to design an optimized system from scratch. So just go for it and write something that basically does what you want right now, and worry about optimization tweaks after you have something that basically does what you want. After getting some concrete performance data (memory, request latency, etc.) is the time to start thinking about performance.
Your problem will almost certainly be on the API providers' side or your network latency, not congestion within the Erlang VM.

Using Erlang, how should I distribute load amongst a cluster?

I was looking at the slave/pool modules and it seems similar to what I
want, but it also seems like I have a single point of failure in my
application (if the master node goes down).
The client has a list of gateways (for the sake of fallback - all do
the same thing) which accept connections, and one is chosen from
randomly by the client. When the client connects all nodes are
examined to see which has the least load and then the IP of the least-
loaded server is forwarded back to the client. The client then
connects to this server and everything is executed there.
In summary, I want all nodes to act as both gateways and to actually
process client requests. The load balancing is only done when the
client initially connects - all of the actual packets and processed on
the client's "home" node.
How would I do this?
I don't know if there is this modules implemented yet but what I can say, load balance is overrated. What I can argue is, random placing of jobs is best bet unless you know far more information how load will come in future and in most of cases you really doesn't. What you wrote:
When the client connects all nodes are examined to see which has the least load and then the IP of the least- loaded server is forwarded back to the client.
How you know that all those least loaded node will not be highest loaded just in next ms? How you know that all those high loaded nodes which you will not include in list will not drop load just in next ms? You really can't know it unless you have very rare case.
Just measure (or compute) your node's performance and set node's probability be chosen depend of it. Choose node randomly regardless of current load. Use this as initial approach. When you set it up, then you can try make up some more sophisticated algorithm. I bet that it will be very hard work to beat this initial approach. Trust me, very hard.
Edit: To be more clear in one subtle detail, I strongly argue that you can't predict future load from current and historical load but you should use knowledge about tasks durations probability and current decomposition of task's lifetime. This work is so hard to try achieve.
The purpose of a supervision tree is to manage the processes not necessarily forward requests. There is no reason you couldn't use different code to send requests directly to members of the list of available processes. See the pool:get_nodes or pool:get_node() functions for one way to get those lists.
You can let the pool module handle the management of the processes (restarting, monitoring, and killing processing) and use some other module to transparently redirect requests to the pool of processes. Maybe you were looking for distributed pools though? It'll be hard to get away from the master process in erlang whithout going to distributed nodes. The whole running system is pretty much one large supervision tree.
I recently remembered the pg module which allows you to setup process groups. messages sent to the group go to every process in the group. It might get you part way toward what you want. you would have to write the code to decide which process handles the request for real but you would get a pool without a master using it.

Resources