How to implement status in Erlang? - erlang

I am thinking an Erlang program that has many workers (loop receive), these workers almost always manipulate their status at the same time, ie. massive concurrent, the amount of workers is so big that keep their status in mnesia will cause performance problem, so I am thinking pass the status as args in each loop, then write to mnesia some time later. Is this a good practice? Is there a better way to do this? (roughly speaking, I'm looking for something like an instance with attributes in the object oriented language)
Thanks.

With Erlang, it is a good habit to see the processes as actor with a dedicated and limited role. With this in mind you will see that you will split your problem in different categories like:
Maintain the state of a connection with a user over the Internet,
Keep information such as login, user profile, friends, shop-cart...
log events
...
for each role you will have to decide if the state information must survive to the process.
In a lot of cases it is not necessary (case 1) and the solution is simply to keep the state in the argument of loop funtion of the process. I encourage you to look at the OTP behaviors, the gen_server and gen_fsm are made for this.
The case 2 obviously manipulates permanent data which must survive to a process crash or even a hardware crash. These data will be stored using dets, mnesia or any database adapted to your problem (Redis, CouchDB ...).
It is important to limit the information stored into external database, otherwise you will not benefit of this very powerful feature which is the lack of side effect. In other words, it is a very bad idea to have process behavior which depends on external information.

Related

Best-effort OTP supervision

What I'd like to do is change my supervisor to make a best effort to keep children running, but give up if their crash rate exceeds the intensity. That way the remainder of the children keep running. This doesn't appear to be possible with the existing supervisor configurations, though, so it looks like my only option may be to implement my own supervisor so I can have it behave this way when it receives EXIT.
Is there a way to implement custom OTP supervisor behavior like this without writing your own supervisor?
It sounds to me like what you want is an individual supervisor for each child, responsible for keeping it alive up to a limit, as you say, and as a layer above that have a single supervisor (one-for-one or simple-one-for-one) whose children are marked as temporary, so that when one of them gives up, the rest stay running.
You can't "extend" Supervisor to add different supervision behaviour, but you don't have to start from scratch either. The :supervisor module itself is implemented on top of :gen_server, so I would consult the source code of :supervisor (which you can find here) if you do find yourself needing some kind of custom supervision behaviour; it will give you a base to build from to avoid some of the pitfalls which you are likely to encounter.
I can expand my answer about alternative solutions once I have a better idea of your use case. As I mentioned in my comment, it sounds to me that you are likely doing something during init/1 of your processes which is prone to failure; init/1 is not the place to handle those things, because if it becomes impossible to succeed at that action temporarily, you will almost certainly blow the max restart intensity of the supervisor.
For example, let's assume you have a process which talks to the database, and requires a database connection; you do not want to try and connect to the database during init/1. Rather you should acquire the connection post-init (perhaps on first-use, or by immediately sending a post-init message to the process using Process.send_after(self(), :connect, 0)), and if the connection fails, return something like {:error, :database_unavailable} to any callers while you attempt to re-establish the connection. Designing with this approach will allow your supervision tree to remain stable, and it instead pushes the decision on how to deal with failure down to the clients who likely have better information on how it impacts them (i.e., should they retry the operation, return an error to their caller, exit with an exception, etc.)
You can use director too, it's more flexible for solving this problem.

is membase a good persistence layer for a erlang gamer server?

I aim to create a browser game where players can set up buildings.
Each building will have several modules (engines, offices,production lines, ...). Each module will have enentually one or more actions running, like creation of 2OO 'item X' with ingredients Y, Z.
The game server will be set up with erlang : An OTP application as the server itself, and nitrogen as the web front.
I need persistence of data. I was thinking about the following :
When somebody or something interacts with a building, or a timer representing some production line ends up, a supervisor spawns a gen_server (if not already spawned) which loads the state of the building from a database, so the gen_server can answer messages like 'add this module', 'starts this action', 'store this production to warehouse', 'die', etc. (
But when a building don't receive any messages during X seconds or minutes, he will terminate (thanks to the gen_server timeout feature) and drop its current state back to the database.
So, as it will be a (soft) real time game, the gen_server must be set up very fastly. I was thinking of membase as the database, because it's known to have very good response time.
My question is : when a gen server is up an running, his states fills some memory, and this state is present in the memory handled by membase too, so the state use two times his size in memory. Is that a bad design ?
Is membase a good solution to handle persistence in my case ? would be use mnesia a better choice , or something else ?
I fear mnesia 2 Go (or 4 ?) table size limit because i don't know at the moment the average state size of my gen_servers (buildings in this example, butalso players, production lines, whatever) and i may have someday more than 1 player :)
Thank you
I agree with Hynek -Pichi- Vychodil. Riak is a great thing for key-valye storage.
We use Riak almost 95% for the same thing you described. Everything works so far without any issues. In case you will hit performance limitation of Riak - add more nodes and it good to go!
Another cool thing about Riak is its very low performance degradation over the time. You can find more information about benchmarking Riak here: http://joyeur.com/2010/10/31/riak-smartmachine-benchmark-the-technical-details/
In case you go with it:
a driver: https://github.com/basho/riak-erlang-client
a connection pool you may need to work with it: https://github.com/dweldon/riakpool
About membase and memory usage: I also tried membase, but I found that it is not suitable for my tasks - (membase declares fault tolerance, but I could not setup it in the way it should work with faults, even with help from membase guys I didn't succeed). So at the moment I use the following architecture: All players that are online and play the game are presented as player-processes (gen_server). All data data and business logic for each player is in its player-process. From time to time each player-process desides to save its state in riak.
So far seems to be very fast and efficient approach.
Update: Now we are with PostgreSQL. It is awesome!
You can look to bitcask or other Riak backends to store your data. Avoid IPC is definitely good idea, so keep it inside Erlang.

In erlang is it possible to have concurrent readers of data owned by another process?

My understanding of the message passing system is that it is serialized and therefore all the reads from different processes are serialized even if the data isn't changing. I would like to have the data read concurrently if possible to take advantage of distributed computing. Is this possible?
You are correct in that messages will be handled sequentially in a process receiving them.
If the data really is static (well, even if it changes sometimes) consider using an ETS table for this kind of scenario. ETS tables are highly optimized for concurrent access whenever applicable. Unless someone is writing to an ETS table (or row) all clients can read the data concurrently from the table.
If you have different processes on the same computer (IMO, this is not a distributed computing), binary type is not serialized, it is passed by reference. So you can read large block of data by many processes without actually copying it. The very idea of "data read concurrently" in a really distributed world doesn't seem right to me (ETS is not an exception).
P.S. Well, what I meant in the last statement was "it doesn't save you from serializing".

Is the process dictionary appropriate in this case?

I've read several comments here and elsewhere suggesting that Erlang's process dictionary was a bad idea and should die. Normally, as a total Erlang newbie, I'd just avoid it. However, in this situation my other options aren't great.
I have a main dispatcher function that looks something like this:
dispatch(State) ->
receive
{cmd1, Params} ->
NewState = do_cmd1_stuff(Params, State),
dispatch(NewState);
{cmd2, Params} ->
NewState = do_cmd2_stuff(Params, State),
dispatch(NewState);
BadMsg ->
log_error(BadMsg),
dispatch(State)
end.
Obviously, my names are more meaningful to me, but that's the gist of it. Deep down in a function called by a function called by a function called by do_cmd2_stuff(), I want to send out messages to all my users telling them about something I've done. In order to do that, I need to get the list of users from the point where I send the messages. The user list doesn't lend itself easily to sticking in the global state, since that's just one data structure representing the only block of data on which I operate.
The way I see it, I have a couple unpleasant options other than using the process dictionary. I can send the user list through all the various levels of functions down to the very bottom one that does the broadcasting. That's unpleasant because it causes all my functions to gain a parameter, whether they really care about it or not.
Alternatively, I could have all the do_cmdN_stuff() functions return a message to send. That's not great either though, since sending the message may not be the last thing I want to do and it clutters up my dispatcher with a bunch of {Msg, NewState} tuples. Furthermore, some of the functions might not have any messages to send some of the time.
Like I said earlier, I'm very new to Erlang. Maybe someone with more experience can point me at a better way. Is there one? Is the process dictionary appropriate in this case?
The general rule is that if you have doubts, you shouldn't use the process dictionary.
If the two options you mentioned aren't good enough (I personally like the one where you return the messages to send) and what you want is some particular piece of code to track users and forward messages to them, maybe what you want to do is have a process holding that info.
Pid ! {forward, Msg}
where Pid will take care of sending everything to a bunch of other processes. Now, you would still need to pass the Pid around, unless you give it a name in some registry to find it. Either with register/2, global or gproc.
A simple answer would be to nest your global within a state record, which is then threaded through the system, at least at the stop level. This makes it easy to add new fields to the state in the future, not an uncommon occurrence, and allow you to keep your global state data structure untouched. So initially
-record(state, {users=[],state_data}).
Defining it as a record makes it easy to access and extend when necessary.
As you mentioned you can always pass the user list as extra param, thats not so bad.
If you don't want to do this just put it in State. You can have a special State just for this part of the calculation that also contains the user list.
Then there always is the possibility of putting it in ETS or in another server process.
What exactly to do is hard to recommend since it depends a lot on your exact application and preferences.
Just choose from the mentioned possibilities as if the process dictionary doesn't exist. Maybe your code needs restructuring if none of the variants look elegant, there always is some better way without the process dictionary.
Its really bad it is still there, because its alluring to many beginning Erlang users.
You really should not use process dictionary. I accept using dictionary only if
It is short living process.
I have full control about the process from spawn to termination i.e. I use minimum and well known set of external modules.
I need performance gain badly. It means avoid copy of data when using ets and dict/gb_tree is too slow (for GC reason).
ad 1. is not your case, you are using in server. ad 2. I don't know if it is your case. ad 3. is not your case because you need list of recipient so you don't gain nothing from that process dictionary is very fast key/value storage. In your case I don't see any reason why you should not include what you need to your State. IMHO State is exactly the right place for it.
Its an interesting question because it involves the fundamentals of functional design.
My opinion:
Try as much as possible to make the function return the messages, then send them. This separates the two different tasks nicely, and separates the purely functional task from the one that causes side effects.
If this isn't possible, pass receivers as argument even if its a bit messy. If the broadcasting function uses that data, it should be given to it explicitly, for clarity and predictability.
Using ETS as Peer Stritzinger suggests is really not any better than the PD, both hides the fact that the broadcasting function uses the receiver list and makes it dependent on global data.
I'm not sure about the Erlang way of encapsulating some state in a process, as I GIVE TERRIBLE ADVICE suggests. Is it really any better that ETS or PD?
clutters up my dispatcher with a bunch
of {Msg, NewState}
This is my experience also, that you often end up like this. It's not particularly pretty, but functional design seems to encourage this. Could some language feature be introduced to make it more beautiful and natural?
EDIT:
6 years ago I wrote:
Could some language feature be introduced to make it more beautiful and natural?
After learning much more about functional programming I have realised that examples of this are state-monads and do-notation that are found in Haskell.
I would consider sending a special message to self() from deep inside the call stack, and handling it at the top level dispatch method that you've sketched, where list of users is available.

Using gen_server to encapsulate an mnesia table?

I have a server application made in Erlang. In it I have an mnesia table
that store some information on photos. In the spirit of "everything is a
process" I decided to wrap that table in a gen_server module, so that the
gen_server module is the only one that directly accesses the table. Querying
and adding information to that table is done by sending messages to that process
(which has a registered name). The idea is that there will be several client
processes querying information from that table.
This works just fine, but that gen_server module has no state. Everything it
requires is stored in the mnesia table. So, I wonder if a gen_server is perhaps
not the best model for encapsulating that table?
Should I simply not make it a process, and instead only encapsulate the table
through the functions in that module? In case of a bug in that module, that
would cause the calling process to crash, which I think might be better, because
it would only affect a single client, as opposed to now, when it would cause the
gen_server process to crash, leaving everyone without access to the table (until
the supervisor restarts it).
Any input is greatly appreciated.
I guess according to Occam's razor there is no need for this gen_server to exist, especially since there is absolutely no state stored in it. Such process could be needed in situations when you need access to the table (or any other resource) to be strictly sequential (for example you might want to avoid any aborted transactions at cost of a bottleneck).
Encapsulating access to the table in a module is a good solution. It creates no additional complexity, while providing proper level of abstraction and encapsulation.
I'm not sure I understand why you've decided to encapsulate a table with a process. Mnesia is designed to mediate multiple concurrent accesses to tables, both locally and distributed across a cluster.
Creating an API module that performs all the particular table access operations and updates is a good idea as the API functions will convey your intent better in the code that calls them. It will be more readable than putting the mnesia operations directly into the calling code.
An API module also gives you the option to switch from mnesia to some other storage system later if you need to. Using mnesia transactions inside your API module protects you from some programming errors as mnesia will roll-back operations that crash. The API module will always be available to callers and allows any number of callers to perform operations concurrently, whereas a gen_server based API has a point of failure, the process, that can render the API unavailable.
The only thing a gen_server based API gives you over a pure-functional API is to serialize access to the table - which is an unusual requirement and unless you specifically need it, it will be a performance killer.
It may be a good idea to handle a mnesia table using single gen_server process when you want to use dirty access and avoid transactions. This approach might be faster than txs, but as usually you need to benchmark it.

Resources