is membase a good persistence layer for a erlang gamer server? - erlang

I aim to create a browser game where players can set up buildings.
Each building will have several modules (engines, offices,production lines, ...). Each module will have enentually one or more actions running, like creation of 2OO 'item X' with ingredients Y, Z.
The game server will be set up with erlang : An OTP application as the server itself, and nitrogen as the web front.
I need persistence of data. I was thinking about the following :
When somebody or something interacts with a building, or a timer representing some production line ends up, a supervisor spawns a gen_server (if not already spawned) which loads the state of the building from a database, so the gen_server can answer messages like 'add this module', 'starts this action', 'store this production to warehouse', 'die', etc. (
But when a building don't receive any messages during X seconds or minutes, he will terminate (thanks to the gen_server timeout feature) and drop its current state back to the database.
So, as it will be a (soft) real time game, the gen_server must be set up very fastly. I was thinking of membase as the database, because it's known to have very good response time.
My question is : when a gen server is up an running, his states fills some memory, and this state is present in the memory handled by membase too, so the state use two times his size in memory. Is that a bad design ?
Is membase a good solution to handle persistence in my case ? would be use mnesia a better choice , or something else ?
I fear mnesia 2 Go (or 4 ?) table size limit because i don't know at the moment the average state size of my gen_servers (buildings in this example, butalso players, production lines, whatever) and i may have someday more than 1 player :)
Thank you

I agree with Hynek -Pichi- Vychodil. Riak is a great thing for key-valye storage.
We use Riak almost 95% for the same thing you described. Everything works so far without any issues. In case you will hit performance limitation of Riak - add more nodes and it good to go!
Another cool thing about Riak is its very low performance degradation over the time. You can find more information about benchmarking Riak here: http://joyeur.com/2010/10/31/riak-smartmachine-benchmark-the-technical-details/
In case you go with it:
a driver: https://github.com/basho/riak-erlang-client
a connection pool you may need to work with it: https://github.com/dweldon/riakpool
About membase and memory usage: I also tried membase, but I found that it is not suitable for my tasks - (membase declares fault tolerance, but I could not setup it in the way it should work with faults, even with help from membase guys I didn't succeed). So at the moment I use the following architecture: All players that are online and play the game are presented as player-processes (gen_server). All data data and business logic for each player is in its player-process. From time to time each player-process desides to save its state in riak.
So far seems to be very fast and efficient approach.
Update: Now we are with PostgreSQL. It is awesome!

You can look to bitcask or other Riak backends to store your data. Avoid IPC is definitely good idea, so keep it inside Erlang.

Related

django channels and running event loop

For a game website, I want a player to contest either agains a human or an AI.
I am using Django + Channels (Django-4.0.2 asgiref-3.5.0 channels-3.0.4)
This is a long way of learning...
Human vs Human: the game take place is the web browser turn by turn. Each time a player connects, it opens a websocket connexion, a move is sent through the socket, processed by the consumer (validated and saved in the database) and sent to the other player.
It is managed only with sync programming.
Human vs AI: I try to use the same route as previously. A test branch check if the game is against the computer and process a move instead of receiving it from the other end of the websocket. This AI move can be a blocking operation as it can take from 2 to 5sec.
I don't want the receive method of the consumer to wait for the AI to return its move, since I have other operations to do quickly (like update some informations on the client side).
Then I thought I could easily take advantage of the allegedly already existing event loop of the channels framework. I could send the AI thinking process to this loop and return the result later to the client through the send method of the consumer.
However, when I write:
loop = asyncio.get_event_loop()
loop.create_task(my_AI_thinking())
Django raises a runtime effort error (the same as described here: https://github.com/django/asgiref/issues/278) telling me there is no running event loop.
The solution seemed to be to upgrade asgiref to 3.5.0 which I did but issue not solved.
I think I am a little bit short of background, and some enlightments should help me to understand a little bit more what is the root cause of this fail.
My first questions would be:
In the combo django + channels + asgi: which is in charge to run the eventloop?
How to check if indeed one event loop is running whatever the thread?
Maybe your answers wil raise other questions.
Did you try running your event_loop example on Django 3.2? (and/or with different Python version)? I experienced various problems with Django 4.0 & Python 3.10, so I keep with Django 3.2 and Python3.7/3.8/3.9 for now, maybe your errors are one of these problems?
If you won't be able to get event_loop running, I see two possible alternative solutions:
Open two WS connections: one only for the moves, and the other for all the other stuff, such as updating information on Player's UI, etc.
You can also use multiprocessing to "manually" send calculating AI move to other thread, and then join the two threads again, after receiving the result (the move). To be honest, multiprocessing in Python is quite simple -- it's pretty handy, if you are familiar with the idea of multithreaded applications.
Unfortunately, I have not yet used event loops in channels myself, maybe someone more experienced in that matter will be able to better address your issue.

How to implement status in Erlang?

I am thinking an Erlang program that has many workers (loop receive), these workers almost always manipulate their status at the same time, ie. massive concurrent, the amount of workers is so big that keep their status in mnesia will cause performance problem, so I am thinking pass the status as args in each loop, then write to mnesia some time later. Is this a good practice? Is there a better way to do this? (roughly speaking, I'm looking for something like an instance with attributes in the object oriented language)
Thanks.
With Erlang, it is a good habit to see the processes as actor with a dedicated and limited role. With this in mind you will see that you will split your problem in different categories like:
Maintain the state of a connection with a user over the Internet,
Keep information such as login, user profile, friends, shop-cart...
log events
...
for each role you will have to decide if the state information must survive to the process.
In a lot of cases it is not necessary (case 1) and the solution is simply to keep the state in the argument of loop funtion of the process. I encourage you to look at the OTP behaviors, the gen_server and gen_fsm are made for this.
The case 2 obviously manipulates permanent data which must survive to a process crash or even a hardware crash. These data will be stored using dets, mnesia or any database adapted to your problem (Redis, CouchDB ...).
It is important to limit the information stored into external database, otherwise you will not benefit of this very powerful feature which is the lack of side effect. In other words, it is a very bad idea to have process behavior which depends on external information.

Assess scalability of Erlang

I'm having some difficulties thinking about a good way to assess the scalability of my school project. The assignment is simply put a twitter-service. Very bare bones. The main goal is to make it as scalable as possible. However, it's equally important to know how and why the assignment would scale. So the point is not really to create a very scalable project, but mainly to create a few different architectures for the server and see which one outperforms which one and why.
So so far I have the basic server architecture like this:
* 1 main process which holds the data
* 1 unique process server-side per user
A user sends messages to his own server-side process, which then simply delegates those messages to the central process.
To test this I would spawn 1 or more processes which would act as clients. I would spam the server with tweets and then assess how well it can withstand a certain load.
Now, to assess the scalability I came up with following metrics:
First off, a process is being a bottleneck if it's message queue is piling up. So, I would store the queuelength every time a tweet is processed (or every N tweets) and in the end calculate the average. If I run it on more cores and the average queue length goes down, it scales better.
Second, if I create N users to spam the server (on N processes or less) I simply time how long it takes for the server to process all these tweets.
Is there a better way to do this? I can't stop thinking that there should be better metrics..
Edt:
So far I have tried fprof and eprof. These tools however, show me how much time is spent in certain methods. While this is a good indicator where I can improve my code, it's not really a good indicator for scalability. It would be better if it would for example show the time spent per process.
Look at percept and percept2 if you are really interested in this.

Erlang in-memory linked list -- Exhausting memory?

I tried to make an Erlang in-memory datastore that would receive messages and add them to a list. Here's the current incarnation. The trouble is, I'm receiving about 200 messages per second and this easily exhausts the memory available.
Once a minute, I send a {write, Pid} message that should clear out and clean up this list, but it doesn't look like it's being garbage collected.
What am I doing wrong? I think I'm approaching this from the completely wrong direction...
datastore(Db) ->
receive
{put, Data} ->
datastore(lists:concat([Data,Db]));
{write, Responder} ->
ScratchName = "ScratchFile.dat",
{ok, ScratchDevice} = file:open(ScratchName,[write]),
file:write(ScratchDevice,Db),
ok = file:close(ScratchDevice),
Responder ! {load, ScratchName},
datastore([])
end.
First spontaneous comment is that file:open will open the file, truncate it, and then write to it. So every time in the loop will overwrite any previous data. So if the Responder is slow with its loading of the file, there could be data you did not expect in the file.
Second reaction is that you don't have to do this buffering yourself. If you open the file with the option {delayed_write, Size, Delay}, and set Size and Delay to values that fit your purpose, you get precisely what you are trying to implement here by just writing all the time.
Third reaction is that you are probably doing the wrong thing if you use a file to communicate between different parts of your system. What are you attempting to do?
ps.
If you need a new random filename, you can easily generate one with erlang:now/0 and io_lib:format/2. As an added bonus they will sort in creation order.
This is a very wrong way of buffering in Erlang. Data Structures such as ETS (http://www.erlang.org/doc/man/ets.html) have been designed to handle thousands and millions of IN-MEMORY Erlang Data Structures with ease. Please, do not use Lists or Queues for handling too much data. If a part of your code will be handling data which other parts of the application are supposed to consume and yet you know that the consumers will be doing it a slower rate as compared to the part that is generating or getting the data, then you need a more robust way of buffering (ETS Tables).
Another thing is that usually, processes are a point of failure in a system. If a process is used to buffer or hold on to very essential data, even if that data is instantaneous but critical to the system, what would happen at that time when the process exits or dies ? ETS tables have been designed in a way that they can provide data access to all processes even applications within the same VM (of type public). In this way, all processes can use the data, reading as much as they want (concurrently) but what you would do is to ensure consistency by having one writer / updater.
ETS Tables rarely fail in an application as compared to the frequency at which processes fail. Most recently, a method that helps us to redeem data in a failing ETS table has been introduced ( ets:give_away/3 ).
Another thing, in a comment above, you have mentioned that you are working for a large Company. Usually, with large teams, its better you evaluate a number of options and make intensive tests against several depending the nature of the application you are developing. To avoid side effects, its best that you identify which data structures are best to use for what. For example, for in-memory storage, capable of handling 200 messages per second, if tested properly, Lists and Files would fail against ETS Tables.

using Kernel#fork for backgrounding processes, pros? cons?

I'd like some thoughts on whether using fork{} to 'background' a process from a rails app is such a good idea or not...
From what I gather fork{my_method; Process#setsid} does in fact do what it's supposed to do.
1) creates another processes with a different PID
2) doesn't interrupt the calling process (e.g. it continues w/o waiting for the fork to finish)
3) executes the child until it finishes
..which is cool, but is it a good idea? What exactly is fork doing? Does it create a duplicate instance of my entire rails mongrel/passenger instance in memory? If so that would be very bad. Or, does it somehow do it without consuming a huge swath of memory.
My ultimate goal was to do away with my background daemon/queue system in favor of forking these processes (primarily sending emails) -- but if this won't save memory then it's definitely a step in the wrong direction
The fork does make a copy of your entire process, and, depending on exactly how you are hooked up to the application server, a copy of that as well. As noted in the other discussion this is done with copy-on-write so it's tolerable. Unix is built around fork(2), after all, so it has to manage it fairly fast. Note that any partially buffered I/O, open files, and lots of other stuff are also copied, as well as the state of the program that is spring-loaded to write them out, which would be incorrect.
I have a few thoughts:
Are you using Action Mailer? It seems like email would be easily done with AM or by Process.popen of something. (Popen will do a fork, but it is immediately followed by an exec.)
immediately get rid of all that state by executing Process.exec of another ruby interpreter plus your functionality. If there is too much state to transfer or you really need to use those duplicated file descriptors, you might do something like IO#popen instead so you can send the subprocess work to do. The system will share the pages containing the text of the Ruby interpreter of the subprocess with the parent automatically.
in addition to the above, you might want to consider the use of the daemons gem. While your rails process is already a daemon, using the gem might make it easier to keep one background task running as a batch job server, and make it easy to start, monitor, restart if it bombs, and shut down when you do...
if you do exit from a fork(2)ed subprocess, use exit! instead of exit
having a message queue and a daemon already set up, like you do, kinda sounds like a good solution to me :-)
Be aware that it will prevent you from using JRuby on Rails as fork() is not implemented (yet).
The semantics of fork is to copy the entire memory space of the process into a new process, but many (most?) systems will do that by just making a copy of the virtual memory tables and marking it copy-on-write. That means that (at first, at least) it doesn't use that much more physical memory, just enough to make the new tables and other per-process data structures.
That said, I'm not sure how well Ruby, RoR, etc. interacts with copy-on-write forking. In particular garbage collection could be problematic if it touches many memory pages (causing them to be copied).

Resources