Erlang OTP based application - architecture ideas - erlang

I'm trying to write an Erlang application (OTP) that would parse a list of users and then launch workers that will work 24X7 to collect user-data (using three different APIs) from remote servers and store it in ets.
What would be the ideal architecture for this kind of application. Do I launch a bunch of workers - one for each user (assuming small number users)? What will happen if number of users increases very rapidly?
Also, to call different APIs I need to put up a Timer mechanism in the worker process.
Any hint will be really appreciated.

Spawning new process for each user is not a such bad idea. There are http servers that do this for each connection, and they doing quite fine.
First of all cost of creating new process is minimal. And cost of maintaining processes is even smaller. If one of the has nothing to do, it won't do anything; there is none (almost) runtime overhead from inactive processes, which in the end means that you are doing only the work you have to do (this is in fact the source of Erlang systems reactivity).
Some issue might be memory usage. Each process has it's own memory stack, and in use-case when they actually do not need to store any internal data, you might be allocating some unnecessary memory. But this also could be modified (even during runtime), and in most cases such memory will be garbage collected.
Actually I would not worry about such things too soon. Issues you might encounter might depend on many things, mostly amount of outside data or user activity, and you can not really design this. Most probably you won't encounter any of them for quite some time. There's no need for premature optimization, especially if you could bind yourself to design that would slow down rest of your development process. In Erlang, with processes being main source of abstraction you can easily swap this process-per-user with pool-of-workers, and ets with external service. But only if you really need it.
What's most important is fact that representing "user" as process would be closest to problem domain. "Users" are independent entities, and deserve separate processes (they have their own state, and they can act or react independent to each other). It is quite similar to using Objects and Classes in other languages (it is over-simplification, but it should get you going).

If you were writing this in Python or C++ would you worry about how many objects you were creating? Only in extreme cases. In Erlang the same general rule applies for processes. Don't worry about how many you are creating.
As for architecture, the only element that is an architectural issue in your question is whether you should design a fixed worker pool or a 1-for-1 worker pool. The shape of the supervision tree would be an outcome of whichever way you choose.
If you are scraping data your real bottleneck isn't going to be how many processes you have, it will be how many network requests you are able to make per second on each API you are trying to access. You will almost certainly get throttled.
(A few months ago I wrote a test demonstration of a very similar system to what you are describing. The limiting factor was API request limits from providers like fb, YouTube, g+, Yahoo, not number of processes.)
As always with Erlang, write some system first, and then benchmark it for real before worrying about performance. You will usually find that performance isn't an issue, and the times that it is you will discover that it is much easier to optimize one small part of an existing system than to design an optimized system from scratch. So just go for it and write something that basically does what you want right now, and worry about optimization tweaks after you have something that basically does what you want. After getting some concrete performance data (memory, request latency, etc.) is the time to start thinking about performance.
Your problem will almost certainly be on the API providers' side or your network latency, not congestion within the Erlang VM.

Related

Async App Server versus Multiple Blocking Servers

tl;dr Many Rails apps or one Vertx/Play! app?
I've been having discussions with other members of my team on the pros and cons of using an async app server such as the Play! Framework (built on Netty) versus spinning up multiple instances of a Rails app server.
I know that Netty is asynchronous/non-blocking, meaning during a database query, network request, or something similar an async call will allow the event loop thread to switch from the blocked request to another request ready to be processed/served. This will keep the CPUs busy instead of blocking and waiting.
I'm arguing in favor or using something such as the Play! Framework or Vertx.io, something that is non-blocking... Scalable. My team members, on the other hand, are saying that you can get the same benefit by using multiple instances of a Rails app, which out of the box only comes with one thread and doesn't have true concurrency as do apps on the JVM. They are saying just use enough app instances to match the performance of one Play! application (or however many Play! apps we use), and when a Rails app blocks the OS will switch processes to a different Rails app. In the end, they are saying that the CPUs will be doing the same amount of work and we will get the same performance.
So here are my questions:
Are there any logical fallacies in the arguments above? Would the OS manage the Rails app instances as well as Netty (which also runs on the JVM, which maps threads to cores very well) manages requests in its event loop?
Would the OS be as performant in switching on blocking calls as would something like Netty or Vertx, or even something built on Ruby's own EventMachine?
With enough Rails app instances to match the performance Play! apps, would there be a cost noticeable cost difference in running the servers? If there are no cost difference it wouldn't really matter what method is used, in my opinion. Shoot if it was cheaper financially to run up a million Rails apps than one Play! app I would rather do that.
What are some other benefits to using either of these approaches that I may be failing to ask about?
Both approaches can and have worked. So if switching would incur a high development cost and/or schedule hit then it's probably not worth the effort...yet. Make the switch when the costs become unacceptably high. Think of using microservices as a gradual switching strategy.
If you are early on in your development cycle then making the switch early may make sense. Rewriting is a pain.
Or perhaps you'll never have to switch and rails will work for your use case like a charm. And you've been so successful at making your customers happy that the cash is just rolling in.
Some of the downsides of a blocking single server approach:
Increased memory usage. Sources: multiple processes, memory leaks, lack of shared datastructures (which increases communication costs and brings up consistency issues).
Lack of parallelism. This has two consequences: more boxes and more latency. You'll need potentially a much larger box count to handle the same load. So if you need to scale and have money concerns then this can be a problem. If it isn't a concern then it doesn't matter. In the server it means increased latency, the sort of latency which can't be improved by multiplying processes, which may be a killer argument depending on your app.
Some examples of those who had made such a switch from rails to node.js and golang:
LinkedIn Moved From Rails To Node: 27 Servers Cut And Up To 20x Faster : http://highscalability.com/blog/2012/10/4/linkedin-moved-from-rails-to-node-27-servers-cut-and-up-to-2.html
Why Timehop Chose Go to Replace Our Rails App : https://medium.com/building-timehop/why-timehop-chose-go-to-replace-our-rails-app-2855ea1912d
How We Moved Our API From Ruby to Go and Saved Our Sanity : http://blog.parse.com/learn/how-we-moved-our-api-from-ruby-to-go-and-saved-our-sanity/
How We Went from 30 Servers to 2: Go : http://www.iron.io/blog/2013/03/how-we-went-from-30-servers-to-2-go.html
These posts represent arguments that are probably illustrative of what your group is going through. The decision is unfortunately not an obvious one.
It depends on the nature of what you are building, the nature of your team, the nature of resources, the nature of your skills, the nature of your goals and how you value all the different tradeoffs.
Would costs really drop? Isn't the same amount of computation done no matter the number of servers?
Depends on the type and scale of the work being done. Typically web services are IO bound, waiting on responses from other services like databases, caches, etc.
If you are using a single threaded server the process is blocked on IO a lot so it is doing nothing a lot. In contrast the nonblocking server will be able to handle many many requests while the single threaded server is blocked. You can keep adding processes, but there are only so many processes a single machine can run. A nonblocking server can have the same number of processes while keeping the CPU busy as possible handling requests. It's often possible to handle higher loads on smaller cheaper machines when using nonblocking servers.
If your expected request rate can be handled by an acceptable number of boxes and you don't expect huge spikes then you would be fine with single threaded servers. Nonblocking servers are great at soaking up load spikes without necessarily having to add machines.
If your work is such that response latencies don't really matter then you can get by with fewer nodes.
If your workload is CPU bound then you'll need more boxes anyway because there won't be the same opportunity for parallelism because the servers won't be blocking on IO.

What makes erlang scalable?

I am working on an article describing fundamentals of technologies used by scalable systems. I have worked on Erlang before in a self-learning excercise. I have gone through several articles but have not been able to answer the following questions:
What is in the implementation of Erlang that makes it scalable? What makes it able to run concurrent processes more efficiently than technologies like Java?
What is the relation between functional programming and parallelization? With the declarative syntax of Erlang, do we achieve run-time efficiency?
Does process state not make it heavy? If we have thousands of concurrent users and spawn and equal number of processes as gen_server or any other equivalent pattern, each process would maintain a state. With so many processes, will it not be a drain on the RAM?
If a process has to make DB operations and we spawn multiple instances of that process, eventually the DB will become a bottleneck. This happens even if we use traditional models like Apache-PHP. Almost every business application needs DB access. What then do we gain from using Erlang?
How does process restart help? A process crashes when something is wrong in its logic or in the data. OTP allows you to restart a process. If the logic or data does not change, why would the process not crash again and keep crashing always?
Most articles sing praises about Erlang citing its use in Facebook and Whatsapp. I salute Erlang for being scalable, but also want to technically justify its scalability.
Even if I find answers to these queries on an existing link, that will help.
Regards,
Yash
Shortly:
It's unmutable. You have no variables, only terms, tuples and atoms. Program execution can be divided by breakpoint at any place. Fully transactional model.
Processes are even lightweight than .NET threads and isolated.
It's made for communications. Millions of connections? Fully asynchronous? Maximum thread safety? Big cross-platform environment, which built only for one purpose — scale&communicate? It's all Ericsson language — first in this sphere.
You can choose some impersonators like F#, Scala/Akka, Haskell — they are trying to copy features from Erlang, but only Erlang born from and born for only one purpose — telecom.
Answers to other questions you can find on erlang.com and I'm suggesting you to visit handbook. Erlang built for other aims, so it's not for every task, and if you asking about awful things like php, Erlang will not be your language.
I'm no Erlang developer (yet) but from what I have read about it some of the features that makes it very scalable is that Erlang has its own lightweight processes that are using message passing to communicate with each other. Because of this there is no such thing as shared state and locking which is the case when using for example a multi threaded Java application.
Another difference compared to Java is that the Erlang VM does garbage collection on every little process that is running which does not take any time at all compared to Java which does garbage collection only per VM.
If you get problem with bottlenecks from database connection you could start by using a database pooling app running against maybe a replicated PostgreSQL cluster or if you still have bottlenecks use a multi replicated NoSQL setup with Mnesia, Riak or CouchDB.
I think process restarts can be very useful when you are experiencing rare bugs that only appear randomly and only when specific criteria is fulfilled. Bugs that cause the application to crash as soon as you restart the app should optimally be fixed or taken care of with a circuit breaker so that it does not spread further.
Here is one way process restart helps. By not having to deal with all possible error cases. Say you have a program that divides numbers. Some guy enters a zero to divide by. Instead of checking for that possible error (and tons more), just code the "happy case" and let process crash when he enters 3/0. It just restarts, and he can figure out what he did wrong.
You an extend this into an infinite number of situations (attempting to read from a non-existent file because the user misspelled it, etc).
The big reason for process restart being valuable is that not every error happens every time, and checking that it worked is verbose.
Error handling is verbose typically, so writing it interspersed with the logic handling doing a task can make it harder to understand the code. Moving that logic outside of the task allows you to more clearly distinguish between "doing things" code, and "it broke" code. You just let the thing that had a problem fail, and handle it as needed by a supervising party.
Since most errors don't mean that the entire program must stop, only that that particular thing isn't working right, by just restarting the part that broke, you can keep operating in a state of degraded functionality, instead of being down, while you repair the problem.
It should also be noted that the failure recovery is bounded. You have to lay out the limits for how much failure in a certain period of time is too much. If you exceed that limit, the failure propagates to another level of supervision. Each restart includes doing any needed process initialization, which is sometimes enough to fix the problem. For example, in dev, I've accidentally deleted a database file associated with a process. The crashes cascaded up to the level where the file was first created, at which point the problem rectified itself, and everything carried on.

Erlang supervisor processes

I have been learning Erlang intensively, and after finishing 'Programming Erlang' from Joe Armstrong, there is one thing that I keep coming back to.
In my mind a Supervisor spawns One process per child handler. So each declared gen_server type handler will run as a separate process.
What happens if you are building a tiny web server and you want each requests to be its own process. Do you still conform to OTP principles and use a gen_server somehow (how ?), or do you create your own behaviour?
How does Cowboy handle this for eg. ? Does it still use gen_server ?
tl;dr: I find that trying to figure out the "correct" supervision structure a the beginning of a project is a form of premature optimization.
The "right" way to design your supervision tree depends on what the worker parts of your application are doing. In the case of a web server I would probably first explore something along the lines of:
top supervisor (singular)
data service supervisor (one per service type)
worker pool (all workers under the service sup)
client connection supervisor (one)
connection worker pool (or one per connection, have to play with it to decide)
logical supervisor (as appropriate -- massive variance here, depending on problem domain)
workers or supervisors (as appropriate -- have to explore/know the problem domain to have any idea how this should be structured)
So that's several workers per supervisor type at the lower level. I haven't used Cowboy so I don't know how it is organized. The point I'm trying to make is that while the mechanics of handling data services serving web pages are relatively trivial, the part of the system that actually does the core problem-solving work might not be and this is going to dictate everything interesting about the system.
It is a bad thing to have your problem-solving bits mixed in the same module as your web-displaying or connection handling bits. Ideally you should be able to use the same logic units in a native application, a web application and a networked service without any changes.
Ultimately the answer to whether you should have 1:1 supervisors to workers or 1:n depends on what you're doing and what restart strategy gives you the best balance among recovery to a known consistent state, latency felt by the user, and resource usage.
One of my favorite things about Erlang is that I can start with a naive supervisor structure like the one above, play with it until I see where its not so good, and rather easily switch things around and experiment with alternatives without fundamentally altering my system much. (The same goes for playing with alternative data representations if you write proper abstractions around them.) So first, get something that works in testing. Then load it up and see if you can break it. Then start worrying about the details, after you understand where the problems actually are.
It is a common pattern to spawn one server per client in erlang, You will then use a supervisor using the simple_one_to_one strategy for the children servers. This allows to ask the server to start a server on_demand. Generally this is used when you don't know how many processes you will need, and when the processes are independent (a crash of one process should not impacts the other).
There is a very good information in the site learningyousomeerlang.com (LYSE supervisor chapter). the whole site is worth to read.

Is no horizontal scalability when it comes to writing a RDBMS defect? or does it happen to all DBMS'?

When you hit a roof on reading from a database, you have two choices, scale vertically by putting more hardware in the server, or scale horizontally by putting a second server to help offload the reads.
Offloading reads to a second server, means that all writes will hit both servers, while read only hits one.
Problem is when you hit a roof with writing, since writing has to happen to all servers, it means that all servers will be overloaded with write requests, and the server comes unusable. Adding more servers to the problem doesn't help, since it only adds more servers that will be overloaded. So you have to scale vertically.
Is this something that is specific to RDBMS'? or is it something that happens with all DBMS'?
I know you can do things on software side, and split the database in two, eg. all entries starting with 0-m in one db while n-z in another, but IMHO it is more of a workaround than a solution to the problem.
I can't see that this would be specific to the relational model. All databases that have to read and write (and that's most of them) will have a similar problem.
For what it's worth, most databases are read far more than written so the write roof occurs less frequently than you might think. In addition, load balancing databases as per your method tends to be an immediate write to the primary with queued writes to all secondaries (at least in my experience).
In that case, you're not actually waiting around for multiple writes as a user, you just wait for the first. The DBMS itself manages the synchronisation between instances. This of course means that secondary databases might not be totally up-to-date but this can be controlled. Technically, this breaks the ACID properties of the system as a whole but this can be architected around.
I think this is the case with any DBMS, although some handle it better than others. Like you mention, partitioning the database in software seems to be the most common solution to this.
In many applications though, partitioning the database like that makes sense anyways if you are at such a huge scale that it becomes necessary. For example, if you had a social networking app, it would probably make sense to partition your database by country or other geographical regions. This would allow you to have your servers located geographically close to the regions they serve. It would also help mitigate any problems with a cross-database "social graph" since peoples friends tend to live nearby.
You're hardly going to "hit a roof with writing, since writing has to happen to all server" because in most of RDBMS installations:
1) Reads are overwhelming more frequent than writes
2) Modern RDBMs have Multi-Version Concurrency Control able to reduce blocking when reading/writing

Process balancing in Erlang

Does anybody knows if there is a sort of 'load-balancer' in the erlang standard library? I mean, if I have some really simple operations on a really large set of data, the overhead of constructing a process for every item will be larger than perform the operation sequentially. But if I can balance the work in the 'right number' of process, it will perform better, so I'm basically asking if there is an easy way to accomplish this task.
By the way, does anybody knows if an OTP application does some kind of balance load? I mean, in an OTP application there is the concept of a "worker process" (like a java-ish thread worker)?
See modules pg2 and pool.
pg2 implements quite simple distributed process pool. pg2:get_closest_pid/1 returns "closest" pid, i.e. random local process if available, otherwise random remote process.
pool implements load balancing between nodes started with module slave.
The plists module probably does what you want. It is basically a parallel implementation of the lists module, design to be used as a drop-in replacement. However, you can also control how it parallelizes its operations, for example by defining how many worker processes should be spawned etc.
You probably would do it by calculating some number of workers depending on the length of the list or the load of the system etc.
From the website:
plists is a drop-in replacement for
the Erlang module lists, making most
list operations parallel. It can
operate on each element in parallel,
for IO-bound operations, on sublists
in parallel, for taking advantage of
multi-core machines with CPU-bound
operations, and across erlang nodes,
for parallizing inside a cluster. It
handles errors and node failures. It
can be configured, tuned, and tweaked
to get optimal performance while
minimizing overhead.
There is no, in my view, usefull generic load-balancing tool in otp. And perhaps it only usefull to have one in specific cases. It is easy enough to implement one yourself. plists may be useful in the same cases. I do not believe in parallel-libraries as a substitute to the real thing. Amdahl will haunt you forever if you walk this path.
The right number of worker processes is equal to the number of schedulers. This may vary depending of what other work is done on the system. Use,
erlang:system_info(schedulers_online) -> NS
to get the number of schedulers.
The notion of overhead when flooding the system with an abundance of worker processes is somewhat faulty. There is overhead with new processes but not as much as with os-threads. The main overhead is message copying between processes, this can be alleviated with the use of binaries since only the reference to the binary is sent. With eterms the structure is first expanded then copied to the other process.
There is no way how to predict cost of work mechanically without measure it e.g do it. Some person must determine how to partition work for some class of tasks. In load balancer word I understand something very different than in your question.

Resources