Context:
I have a system that will need to support 20,000 connected chat users spread over 100 chat rooms. During performance testing I've found that I can get up to 6,000 connected users on a single box before I get a crash dump, so in production I'll probably go with four servers in a cluster.
My Question:
I understand that a chatroom is bound to a server node, so that if the node dies the chatroom disappears with it and the users no longer belong to the room. Is there a way to "replicate" a chatroom over to another node so that users who are left behind are moved to the replicated room? If not, what do you do to keep continuity for the users?
What hardware are you using ? 6000 connected users seem a bit low. Also, ejabberd is not supposed to crash under load. It might slow down, but not crash.
There is something wrong in your setup.
About replicating a chatroom node, it's not easy. It's better to handle smooth reconnection on the client side.
But then again, ejabberd should not crash under this kind of load, unless something's wrong.
Related
I was thinking about a project the other day and an idea stumbled across my mind:
How would you set up a server, for instance at a restaurant, and have an app of the restaurant chain which finds and directly connects to the locally running server without being in the same network. This should work at higher ranges, so simple Bluetooth is not fitting here.
I've done some research, but I'm not very... good with protocols, and nothing I've found seems to really fit that goal. So I'm wondering what the best way to code this connection would be. Any ideas would be helpful :)
Here are the requirements:
A standard phone should be able to perform the operation.
It should have a fairly high range, think a big store or restaurant.
Server and device do not have to be in the same network.
The connection get's established by the phone on app start, to a server that can do whatever it needs to get this done.
In a Big-IP LTM system I have http monitors setup for a pool so that system owners can remove a file a node in the pool to remove a node from rotation. But monitors mark a node as disabled, not offline, so cookie-based persistence will still send existing users to the node that should be down. Whats the best way to use monitors to either offline a node instead of disabling it, or forcing users to a new node despite persistence?
Disabling a pool member still allows active connections/persistent connections to function. And depending on how you have persistence defined, that can end up being a LONG time.
Forced Offline still allows active connections to complete their transactions but would move previously persisted traffic to other nodes.
When doing maintenance I would force the node offline and then give sessions 5 to 10 minutes to complete before taking the node down in infrastructure. There's no good way to drop a node with active connections further dependent on what the client is actually doing.
Here's a great response on F5's Community by one of their MVP's to help explain connections.
Pool Member disabled/forced offline behavior # DevCentral
Let me know if you need more details or if you're seeing different behavior. Also, you can do all of this with REST API so you don't need to use the GUI. Doing a quick Node Offline/Online is super easy and quick.
If develop a online real time game with websocket, multiplayers running on the different containers, how to sync data when add or reduce containers if they are playing?
Does kubernetes has any good feature on this case?
ThatBrianDude already gave an awesome answer, and mine will not be that good. But I think your last comment gave us more hints about the architecture you have in mind. I hope my humble answer will shed a light on more ideas to your game. Here are some suggestions:
First, avoid keeping any state in the websocket apps.
The basic idea with containers is that they should be stateless.
ThatBrianDude
So, why not use caches and a messaging layer to help you with that. Imagine the following examples:
Situation 1: if the client sends an action to the websocket server, the server should put it in a queue/topic (some other service will process it later on).
Situation 2: The server might also listen to a(some) topic(s) for some types of messages, and send them back to the clients that need that information.
Situation 3: when the client asks for information or if the websocket server needs some information to send to the client, the server must read it from a cache, as reading from DB might be slow for a multiplayer game.
Situation 4: eventually a container is killed. The clients connected to that server will receive a connection error, and should reconnect. That means another handshake, and the player might feel it, depending on what the game was doing, so killing a container should not happen that often. But that would be just it, no information is lost.
This way, the websocket server containers are totally stateless, and the messaging topics and caches will help you to: provide all the information needed to containers, and; keep websockets, persistance and processing isolated and scalable.
Summing up, the information would flow like this:
clients are showering the websocket server containers with actions
websocket servers just send them to the messaging layer
processing containers (which can be scalled too!) receive those messages, process them, save to the database and/or to a cache and eventually send more messages to other topics
(optional) websocket servers receive those messages and send them to the clients.
Or like this:
clients ask for information or websocket servers periodically need to send the world state to clients
websocket servers look up the information in the cache
and send it to the clients.
Or even like this:
Some processing servers are independent of messages, they just read the game/world state (from the cache?) periodically
they process the physics and mechanics of the game
and save the result back in the cache, which will be sent to the clients by the websocket servers periodically, or send it in a topic so the websocket server can listen to it and send it to the clients.
Lastly, don't forget the suggestion to have one machine responsible for one game/world. It would be nice if each processing server (or each thread of a server) works with one game/world. That would make it easier to persist things without the need to sync stuff.
The basic idea with containers is that they should be stateless.
This means that any persistant data your game might have (highscores etc.) must be saved to a persistant DB whereas other temporary data like current ingame score or nickname etc. can stay inside the memory of the container and be gone once the container dies.
how to sync data when add or reduce containers if they are playing?
This sounds like you want to use multiple containers computing one game world?
Thats a whole other beast on its own but you might want to take a look at SpatialOS which pretty much allows for massive multiplayer worlds and is designed for games that require more than one machine per world.
If thats not what you are looking for I would recommend you to keep one machine responsible for one game/world as you will avoid high complexity when you try to sync stuff later on.
I just finished Erlang in Practice screencasts (code here), and have some questions about distribution.
Here's the is overall architecture:
Here is how to the supervision tree looks like:
Reading Distributed Applications leads me to believe that one of the primary motivations is for failover/takeover.
However, is it possible, for example, the Message Router supervisor and its workers to be on one node, and the rest of the system to be on another, without much changes to the code?
Or should there be 3 different OTP applications?
Also, how can this system be made to scale horizontally? For example if I realize now that my system can handle 100 users, and that I've identified the Message Router as the main bottleneck, how can I 'just add another node' where now it can handle 200 users?
I've developed Erlang apps only during my studies, but generally we had many small processes doing only one thing and sending messages to other processes. And the beauty of Erlang is that it doesn't matter if you send a message within the same Erlang VM or withing the same Computer, same LAN or over the Internet, the call and the pointer to the other process looks always the same for the developer.
So you really want to have one application for every small part of the system.
That being said, it doesn't make it any simpler to construct an application which can scale out. A rule of thumb says that if you want an application to work on a factor of 10-times more nodes, you need to rewrite, since otherwise the messaging overhead would be too large. And obviously when you start from 1 to 2 you also need to consider it.
So if you found a bottleneck, the application which is particularly slow when handling too many clients, you want to run it a second time and than you need to have some additional load-balancing implemented, already before you start the second application.
Let's assume the supervisor checks the message content for inappropriate content and therefore is slow. In this case the node, everyone is talking to would be simple router application which would forward the messages to different instances of the supervisor application, in a round robin manner. In case those 1 or 2 instances are not enough, you could have the router written in a way, that you can manipulate the number of instances by sending controlling messages.
However for this, to work automatically, you would need to have another process monitoring the servers and discovering that they are overloaded or under utilized.
I know that dynamically adding and removing resources always sounds great when you hear about it, but as you can see it is a lot of work and you need to have some messaging system built which allows it, as well as a monitoring system which can monitor the need.
Hope this gives you some idea of how it could be done, unfortunately it's been over a year since I wrote my last Erlang application, and I didn't want to provide code which would be possibly wrong.
I'm building a small network server for a multi-player board game using Erlang.
This network server uses a local instance of Mnesia DB to store a session for each connected client app. Inside each client's record (session) stored in this local Mnesia, I store the client's PID and NODE (the node where a client is logged in).
I plan to deploy this network server on at least 2 connected servers (Node A & B).
So in order to allow a Client A who is logged in on Node A to search (query to Mnesia) for a Client B who is logged in on Node B, I replicate the Mnesia session table from Node A to Node B or vise-versa.
After Client A queries the PID and NODE of the Client B, then Client A and B can communicate with each other directly.
Is this the right way of establishing connection between two client apps that are logged-in on two different Erlang nodes?
Creating a system where two or more nodes are perfectly in sync is by definition impossible. In practice however, you might get close enough that it works for your particular problem.
You don't say the exact reason behind running on two nodes, so I'm going to assume it is for scalability. With many nodes, your system will also be more available and fault-tolerant if you get it right. However, the problem could be simplified if you know you only ever will run in a single node, and need the other node as a hot-slave to take over if the master is unavailable.
To establish a connection between two processes on two different nodes, you need some global addressing(user id 123 is pid<123,456,0>). If you also care about only one process running for User A running at a time, you also need a lock or allow only unique registrations of the addressing. If you also want to grow, you need a way to add more nodes, either while your system is running or when it is stopped.
Now, there are already some solutions out there that helps solving your problem, with different trade-offs:
gproc in global mode, allows registering a process under a given key(which gives you addressing and locking). This is distributed to the entire cluster, with no single point of failure, however the leader election (at least when I last looked at it) works only for nodes that was available when the system started. Adding new nodes requires an experimental version of gen_leader or stopping the system. Within your own code, if you know two players are only going to ever talk to each other, you could start them on the same node.
riak_core, allows you to build on top of the well-tested and proved architecture used in riak KV and riak search. It maps the keys into buckets in a fashion that allows you to add new nodes and have the keys redistributed. You can plug into this mechanism and move your processes. This approach does not let you decide where to start your processes, so if you have much communication between them, this will go across the network.
Using mnesia with distributed transactions, allows you to guarantee that every node has the data before the transaction is commited, this would give you distribution of the addressing and locking, but you would have to do everything else on top of this(like releasing the lock). Note: I have never used distributed transactions in production, so I cannot tell you how reliable they are. Also, due to being distributed, expect latency. Note2: You should check exactly how you would add more nodes and have the tables replicated, for example if it is possible without stopping mnesia.
Zookeper/doozer/roll your own, provides a centralized highly-available database which you may use to store the addressing. In this case you would need to handle unregistering yourself. Adding nodes while the system is running is easy from the addressing point of view, but you need some way to have your application learn about the new nodes and start spawning processes there.
Also, it is not necessary to store the node, as the pid contains enough information to send the messages directly to the correct node.
As a cool trick which you may already be aware of, pids may be serialized (as may all data within the VM) to a binary. Use term_to_binary/1 and binary_to_term/1 to convert between the actual pid inside the VM and a binary which you may store in whatever accepts binary data without mangling it in some stupid way.