I'm a newbie to Erlang. In Erlang, node is represented by an atom, like 'name#host'. I want to ask how a node can communicate with the other nodes without increasing the number of atoms of it?
I want to build a very distributed storage system which may contain thousands of nodes.
For a specified node A, it can send/receive messages to/from any other nodes in the cluster, for example:
rpc:call(Node, Module, Method, [])
But with the node joining and leaving the cluster, node A may have communicated to thousands of nodes,
in this way, the number of atom of node A will keep increasing and finally reach the limit.
How to avoid this from happending?
If I use the Pid instead of Node to communicate, for example,
Pid ! Message
Will this way increase the number of atoms in node A? It is said that Pid contains the information of a remote node.
The maximum number of atoms is 1048576 and you can raise it with +t. see: erlang docs
There's no chance you hit the limit with erlang clustering. If you were to scale to the 10s of thousands to millions range, you are very likely to have several separate clusters.
Distributed Erlang keeps the cluster alive with tcp heartbeats between nodes. You probably don't really want a single cluster more than a few thousand nodes.
Related
Would native Erlang messages provide reasonable performance when there are lots of nodes or binary data?
Case 1: There's a dynamic pool of about 50-200 machines (erlang nodes). It's constantly changing, about 5-50 machines added or removed every 10min.
Case 2: Let's say we are using this cluster to build youtube-clone and planning to stream video data via messages.
By reasonable performance I mean - it's ok to be 2-3 times slower than the top possible performance achieved by the complex Erlang code, 10 times slower is not ok.
There is not any significant difference between sending a message and binary data. The message is just transformed to the binary packet using term_to_binary and sent via TCP and same apply to the binary data. (Well, it is little bit smarter than that because textual form of the same atoms is not sent again and again as would simple term_to_binary do.) So the difference is negligible.
There are important details:
1) In clusters over 100 nodes, ping noise in full connected cluster will be significant part of network traffic. Even bigger deployments require deep changes in Erlang VM and OS.
2) If you want to stream video or audio you need plan capacity of single node: clients per node, tcp/udp packets rate, network bandwidth.
3) There is performance limit ~150-200K/s messages between 2 processes on different nodes.
I have two Aerospike servers cluster runnning with a replication factor of 2. Both servers have the same replicated objects count, which means all records are replacated. But still the monitoring panel shows incoming and outgoing migration going on.
This happened after I restarted one of the servers. Now de I/O rate in both servers are above it was before restarting.
Why is this happening?
When a node leaves the cluster, the partition id of any partition that node was a member of advances. When the node returns, they share their partition info with the cluster and migrations are required for any partition the returning node is a member of. This is done because while the node was down, the remaining node may have taken on writes.
For replication factor 2 with 2 nodes, both nodes are members of all partitions.
Part A:
Erlang has a lot of success stories about running concurrent agents e.g. the millions of simultaneous Facebook chats. That's millions of agents, but of course it's not millions of CPUs across a network. I'm having trouble finding metrics on how well Erlang scales when scaling is "horizontal" across a LAN/WAN.
Let's assume that I have many (tens of thousands) physical nodes (running Erlang on Linux) that need to communicate and synchronize small infrequent amounts of data across the LAN/WAN. At what point will I have communications bottlenecks, not between agents, but between physical nodes? (Or will this just work, assuming a stable network?)
Part B:
I understand (as an Erlang newbie, meaning I could be totally wrong) that Erlang nodes attempt to all connect to and be aware of each other, resulting in an N^2 connection point-to-point network. Assuming that part A won't just work with N = 10K's, can Erlang be configured easily (using out-of-the-box config or trivial boilerplate, not writing a full implementation of grouping/routing algorithms myself) to cluster nodes into manageable groups and route system -wide messages through the cluster/group hierarchy?
We should specify that we talk about horizontal scalability of physical machines -- that's the only problem. CPUs on one machine will be handled by one VM, no matter what the number of those is.
node = machine.
To begin, I can say that 30-60 nodes you get out of the box (vanilla OTP installation) with any custom application written on the top of that (in Erlang). Proof: ejabberd.
~100-150 is possible with optimized custom application. I means, it has to be good code, written with knowledge about GC, characteristic of data types, message passing etc.
over +150 is all right but when we talk about numbers like 300, 500 it will require optimizations & customizations of TCP layer. Also, our app has to be aware of cost of e.g. sync calls across the cluster.
The other thing is DB layer. Mnesia (built-in) due its features will not be effective over 20 nodes (my experience - I may be wrong). Solution: just use something else: dynamo DBs, separate cluster of MySQLs, HBase etc.
The most common technique to leverage cost of creating high quality application and scalability are federations of ~20-50 nodes clusters. So internally its an efficient mesh of ~50 erlang nodes and its connected via any suitable protocol with N another 50 nodes clusters. To sum up, such a system is federation of N erlang clusters.
Distributed erlang is designed to run in one data center. If you need more, geographically distant nodes, then use federations.
There are lots of config options e.g. which do not connect all nodes to each other. It may be helpful, however in ~50 cluster erlang overhead is not significant. Also you can create a graph of erlang nodes using 'hidden' connection, which doesn't join this full mesh, but also it cannot benefit from connection to all nodes.
The biggest problem I see, in this kind of systems, is designing it as master-less system. If you do not need that, everything should be ok.
I have Several Erlang applications on Node A and they are making rpc calls to Node B onto which i have Mnesia Stored procedures (Database querying functions) and my Mnesia DB as well. Now, occasionally, the number of simultaneous processes making rpc calls to Node B for data can rise to 150. Now, i have several questions:
Question 1: For each rpc call to a remote Node, does Node A make a completely new (say TCP/IP or UDP connection or whatever they use at the transport) CONNECTION? or there is only one connection and all rpc calls share this one (since Node A and Node B are connected [got to do with that epmd process])?
Question 2: If i have data centric applications on one node and i have a centrally managed Mnesia Database on another and these Applications' tables share the same schema which may be replicated, fragmented, indexed e.t.c, which is a better option: to use rpc calls in order to fetch data from Data Nodes to Application nodes or to develope a whole new framework using say TCP/IP (the way Scalaris guys did it for their Failure detector) to cater for network latency problems?
Question 3: Has anyone out there ever tested or bench marked the rpc call efficiency in a way that can answer the following?
(a) What is the maximum number of simultaneous rpc calls an Erlang Node can push onto another without breaking down?
(b) Is there a way of increasing this number, either by a system configuration or operating system setting? (refer to Open Solaris for x86 in your answer)
(c) Is there any other way of applications to request data from Mnesia running on remote Erlang Nodes other than rpc? (say CORBA, REST [requires HTTP end-to-end], Megaco, SOAP e.t.c)
Mnesia runs over erlang distribution, and in Erlang distribution there is only one tcp/ip connection between any pair of nodes (usually in a fully mesh arrangement, so one connection for every pair of nodes). All rpc/internode communication will happen over this distribution connection.
Additionally, it's guaranteed that message ordering is preserved between any pair of communicating processes over distribution. Ordering between more than two processes is not defined.
Mnesia gives you a lot of options for data placement. If you want your persistent storage on node B, but processing done on node A, you could have disc_only_copies of your tables on B and ram_copies on node A. That way applications on node A can get quick access to data, and you'll still get durable copies on node B.
I'm assuming that the network between A and B is a reliable LAN that is seldom going to partition (otherwise you're going to spend a bunch of time getting mnesia back online after a partition).
If both A and B are running mnesia, then I would let mnesia do all the RPC for me - this is what mnesia is built for and it has a number of optimizations. I wouldn't roll my own RPC or distribution mechanism without a very good reason.
As for benchmarks, this is entirely dependent on your hardware, mnesia schema and network between nodes (as well as your application's data access patterns). No one can give you these benchmarks, you have to run them yourself.
As for other RPC mechanisms for accessing mnesia, I don't think there are any out of the box, but there are many RPC libraries you could use to present the mnesia API to the network with a small amount of effort on your part.
Hi
I am implementing an IM-server-like application in Erlang. I used one agent process for each client connecting to the server, and the agent process is responsible for sending the messages to a message gateway, which in turn sends the message to another agent process. It seems that erlang inter-process messaging is implemented as tcp connections. So there will be one connection for each agent process to the gateway. Does this mean that the number of agents on a single machine will never exceed over 65,535 due to the restriction of port numbers?
Thanks in advance!
Some limits for Erlang: Efficiency Guide
User's Guide / 10 Advanced:
Distributed nodes
Known nodes
A remote node Y has to be known to node X if there exist any pids, ports, references, or funs (Erlang data types) from Y on X, or if X and Y are connected. The maximum number of remote nodes simultaneously/ever known to a node is limited by the maximum number of atoms available for node names. All data concerning remote nodes, except for the node name atom, are garbage-collected.
Connected nodes
The maximum number of simultaneously connected nodes is limited by either the maximum number of simultaneously known remote nodes, the maximum number of (Erlang) ports available, or the maximum number of sockets available.
Erlang connects Erlang nodes over the network, not Erlang processes. (And each Erlang node is an Operating System process).
So you will run out of TCP connections when you have several dozen Ks of Erlang nodes on a single machine (which is unreasonable), not when you have several dozen Ks of Erlang processes on a single Erlang node.
(I am not giving any absolute numbers as each Erlang node will also need to communicate with epmd which produces another network connection. But you will probably not have 30+K of Erlang nodes aka OS processes on a single machine anyway.)