Is the process dictionary appropriate in this case? - erlang

I've read several comments here and elsewhere suggesting that Erlang's process dictionary was a bad idea and should die. Normally, as a total Erlang newbie, I'd just avoid it. However, in this situation my other options aren't great.
I have a main dispatcher function that looks something like this:
dispatch(State) ->
receive
{cmd1, Params} ->
NewState = do_cmd1_stuff(Params, State),
dispatch(NewState);
{cmd2, Params} ->
NewState = do_cmd2_stuff(Params, State),
dispatch(NewState);
BadMsg ->
log_error(BadMsg),
dispatch(State)
end.
Obviously, my names are more meaningful to me, but that's the gist of it. Deep down in a function called by a function called by a function called by do_cmd2_stuff(), I want to send out messages to all my users telling them about something I've done. In order to do that, I need to get the list of users from the point where I send the messages. The user list doesn't lend itself easily to sticking in the global state, since that's just one data structure representing the only block of data on which I operate.
The way I see it, I have a couple unpleasant options other than using the process dictionary. I can send the user list through all the various levels of functions down to the very bottom one that does the broadcasting. That's unpleasant because it causes all my functions to gain a parameter, whether they really care about it or not.
Alternatively, I could have all the do_cmdN_stuff() functions return a message to send. That's not great either though, since sending the message may not be the last thing I want to do and it clutters up my dispatcher with a bunch of {Msg, NewState} tuples. Furthermore, some of the functions might not have any messages to send some of the time.
Like I said earlier, I'm very new to Erlang. Maybe someone with more experience can point me at a better way. Is there one? Is the process dictionary appropriate in this case?

The general rule is that if you have doubts, you shouldn't use the process dictionary.
If the two options you mentioned aren't good enough (I personally like the one where you return the messages to send) and what you want is some particular piece of code to track users and forward messages to them, maybe what you want to do is have a process holding that info.
Pid ! {forward, Msg}
where Pid will take care of sending everything to a bunch of other processes. Now, you would still need to pass the Pid around, unless you give it a name in some registry to find it. Either with register/2, global or gproc.

A simple answer would be to nest your global within a state record, which is then threaded through the system, at least at the stop level. This makes it easy to add new fields to the state in the future, not an uncommon occurrence, and allow you to keep your global state data structure untouched. So initially
-record(state, {users=[],state_data}).
Defining it as a record makes it easy to access and extend when necessary.

As you mentioned you can always pass the user list as extra param, thats not so bad.
If you don't want to do this just put it in State. You can have a special State just for this part of the calculation that also contains the user list.
Then there always is the possibility of putting it in ETS or in another server process.
What exactly to do is hard to recommend since it depends a lot on your exact application and preferences.
Just choose from the mentioned possibilities as if the process dictionary doesn't exist. Maybe your code needs restructuring if none of the variants look elegant, there always is some better way without the process dictionary.
Its really bad it is still there, because its alluring to many beginning Erlang users.

You really should not use process dictionary. I accept using dictionary only if
It is short living process.
I have full control about the process from spawn to termination i.e. I use minimum and well known set of external modules.
I need performance gain badly. It means avoid copy of data when using ets and dict/gb_tree is too slow (for GC reason).
ad 1. is not your case, you are using in server. ad 2. I don't know if it is your case. ad 3. is not your case because you need list of recipient so you don't gain nothing from that process dictionary is very fast key/value storage. In your case I don't see any reason why you should not include what you need to your State. IMHO State is exactly the right place for it.

Its an interesting question because it involves the fundamentals of functional design.
My opinion:
Try as much as possible to make the function return the messages, then send them. This separates the two different tasks nicely, and separates the purely functional task from the one that causes side effects.
If this isn't possible, pass receivers as argument even if its a bit messy. If the broadcasting function uses that data, it should be given to it explicitly, for clarity and predictability.
Using ETS as Peer Stritzinger suggests is really not any better than the PD, both hides the fact that the broadcasting function uses the receiver list and makes it dependent on global data.
I'm not sure about the Erlang way of encapsulating some state in a process, as I GIVE TERRIBLE ADVICE suggests. Is it really any better that ETS or PD?
clutters up my dispatcher with a bunch
of {Msg, NewState}
This is my experience also, that you often end up like this. It's not particularly pretty, but functional design seems to encourage this. Could some language feature be introduced to make it more beautiful and natural?
EDIT:
6 years ago I wrote:
Could some language feature be introduced to make it more beautiful and natural?
After learning much more about functional programming I have realised that examples of this are state-monads and do-notation that are found in Haskell.

I would consider sending a special message to self() from deep inside the call stack, and handling it at the top level dispatch method that you've sketched, where list of users is available.

Related

How to implement status in Erlang?

I am thinking an Erlang program that has many workers (loop receive), these workers almost always manipulate their status at the same time, ie. massive concurrent, the amount of workers is so big that keep their status in mnesia will cause performance problem, so I am thinking pass the status as args in each loop, then write to mnesia some time later. Is this a good practice? Is there a better way to do this? (roughly speaking, I'm looking for something like an instance with attributes in the object oriented language)
Thanks.
With Erlang, it is a good habit to see the processes as actor with a dedicated and limited role. With this in mind you will see that you will split your problem in different categories like:
Maintain the state of a connection with a user over the Internet,
Keep information such as login, user profile, friends, shop-cart...
log events
...
for each role you will have to decide if the state information must survive to the process.
In a lot of cases it is not necessary (case 1) and the solution is simply to keep the state in the argument of loop funtion of the process. I encourage you to look at the OTP behaviors, the gen_server and gen_fsm are made for this.
The case 2 obviously manipulates permanent data which must survive to a process crash or even a hardware crash. These data will be stored using dets, mnesia or any database adapted to your problem (Redis, CouchDB ...).
It is important to limit the information stored into external database, otherwise you will not benefit of this very powerful feature which is the lack of side effect. In other words, it is a very bad idea to have process behavior which depends on external information.

OTP behaviours: gen_fsm; gen_event. Practical examples?

I have used both the supervisor and gen_server behaviours, and I can understand the practical uses for both of them. However, I don't really understand the use of the gen_fsm and the gen_event behaviours. Can someone clarify with practical examples?
Thanks in advance
One classic example for FSM would be lock with timeout which is mentioned in manual,
Another example which I implemented in my experience would be telephone lines, because phone have states, like ringing, connected, disconnected etc and some operations are allowed and some are not allowed during this states.
An example for event is logging used in https://github.com/basho/lager
gen_fsm is a neat implementation of finite state machine, you cane do roughly the same thing that you do with a gen_server, and in addition manage easily the different states of your application (for example in a game server select a level, a table, modify player attribute, play, save, restore...).
gen-event is an easy way to dispach event, your application send all event to the gen_event knowing nothing about potential usage, and you dynamically add and delete handlers, with different behavior (log in file, in a database, display information in graphical interface...). I have used this to have a graphical view of the processes state and communication of my application, and file log for performance analysis.
Some good examples you can find them here:
"Event handlers" and "Finite State Machines"
gen_fsm:
The gen_fsm behaviour is somewhat similar to gen_server in that it is
a specialised version of it. The biggest difference is that rather
than handling calls and casts, we're handling synchronous and
asynchronous events. Much like our dog and cat examples, each state is
represented by a function. Again, we'll go through the callbacks our
modules need to implement in order to work.
gen_event:
The gen_event behaviour differs quite a bit from the gen_server and
gen_fsm behaviours in that you are never really starting a process.
The gen_event behaviour basically runs the process that accepts and
calls functions, and you only provide a module with these functions.
This is to say, you have nothing to do with regards to event
manipulation except give your callback functions in a format that
pleases the event manager. All managing is done for free; you only
provide what's specific to your application. This is not really
surprising given OTP is, again, all about separating what's generic
from specific.

Clone a lua state

Recently, I have encountered many difficulties when I was developing using C++ and Lua. My situation is: for some reason, there can be thousands of Lua-states in my C++ program. But these states should be same just after initialization. Of course, I can do luaL_loadlibs() and lua_loadfile() for each state, but that is pretty heavy(in fact, it takes a rather long time for me even just initial one state). So, I am wondering the following schema: What about keeping a separate Lua-state(the only state that has to be initialized) which is then cloned for other Lua-states, is that possible?
When I started with Lua, like you I once wrote a program with thousands of states, had the same problem and thoughts, until I realized I was doing it totally wrong :)
Lua has coroutines and threads, you need to use these features to do what you need. They can be a bit tricky at first but you should be able to understand them in a few days, it'll be well worth your time.
take a look to the following lua API call I think it is what you exactly need.
lua_State *lua_newthread (lua_State *L);
This creates a new thread, pushes it on the stack, and returns a pointer to a lua_State that represents this new thread. The new thread returned by this function shares with the original thread its global environment, but has an independent execution stack.
There is no explicit function to close or to destroy a thread. Threads are subject to garbage collection, like any Lua object.
Unfortunately, no.
You could try Pluto to serialize the whole state. It does work pretty well, but in most cases it costs roughly the same time as normal initialization.
I think it will be hard to do exactly what you're requesting here given that just copying the state would have internal references as well as potentially pointers to external data. One would need to reconstruct those internal references in order to not just have multiple states pointing to the clone source.
You could serialize out the state after one starts up and then load that into subsequent states. If initialization is really expensive, this might be worth it.
I think the closest thing to doing what you want that would be relatively easy would be to put the states in different processes by initializing one state and then forking, however your operating system supports it:
http://en.wikipedia.org/wiki/Fork_(operating_system)
If you want something available from within Lua, you could try something like this:
How do you construct a read-write pipe with lua?

Event manager process in erlang. Named processes or Pids?

I have event manager process that dispatches events to subscribers (e.g. http_session_created, http_sesssion_destroyed). If Pid is used instead of named process, I must put it into functions to operate with event manager but if Named process is used, code will be more clear.
Which variant is right?
Thank you!
While there is no actual difference to the process naming a process, registering it, makes it global. You in essence you are telling the system that here is a global service which anyone can use.
From you description it more sounds like you are giving them names to save the, small, effort of carrying them around in your loop. If this is the case I would put their pids in a record with all the other state data you carry around. This much better indicates the type of the processes.
If you have a fixed set of "subscriber" processes, then use registered names IMO.
If, on the contrary, you have a publish/subscribe sort of architecture where subscribers come and go, then you need an infrastructure to track those and from this point it doesn't really matter if you use Pid() or names.
One of the drawbacks of using registered names is that you need to track them in your code base to avoid "collisions". So it is up to you: personally, I tend to favor named processes as, like you say, it makes reading the code clearer. One way or another, OTP doesn't care.

Erlang gen_server vs stateless module

I've recently finished Joe's book and quite enjoyed it.
I'm since then started coding a soft realtime application with erlang and I have to say I am a bit confused at the use of gen_server.
When should I use gen_server instead of a simple stateless module?
I define a stateless module as follow:
- A module that takes it's state as a parameter (much like ETS/DETS) as opposed to keeping it internally (like gen_server)
Say for an invoice manager type module, should it initialize and return state which I'd then pass subsequently to it?
SomeState = InvoiceManager:Init(),
SomeState = InvoiceManager:AddInvoice(SomeState, AnInvoiceFoo).
Suppose I'd need multiple instances of the invoice manager state (say my application manages multiple companies each with their own invoices), should they each have a gen_server with internal state to manage their invoices or would it better fit to simply have the stateless module above?
Where is the line between the two?
(Note the invoice manage example above is just that, an example to illustrate my question)
I don't really think you can make that distinction between what you call a stateless module and gen_server. In both cases there is a recursive receive loop which carries state in at least one argument. This main loop handles requests, does work depending on the requests and, when necessary, sends results back the requesters. The main loop will most likely handle a number of administrative requests as well which may not be part of the main API/protocol.
The difference is that gen_server abstracts away the main receive loop and allows the user to only the write the actual user code. It will also handle many administrative OTP functions for you. The main difference is that the user code is in another module which means that you see the passed through state more easily. Unless you actually manage to write your code in one big receive loop and not call other functions to do the work there is no real difference.
Which method is better depends very much on what you need. Using gen_server will simplify your code and give you added functionality "for free" but it can be more restrictive. Rolling your own will give you more power but also you give more possibilities to screww things up. It is probably a little faster as well. What do you need?
It strongly depend of your needs and application design. When you need shared state between processes you have to use process to keep this state. Then gen_server, gen_fsm or other gen_* is your friend. You can avoid this design when your application is not concurrent or this design doesn't bring you some other benefits. For example break your application to processes will lead to simpler design. In other case sometimes you can choose single process design and using "stateless" modules for performance or such. "stateless" module is best choice for very simply stateless (pure functional) tasks. gen_server is often best choice for thinks that seems naturally "process". You must use it when you want share something between processes (using processes can be constrained by scalability or concurrency).
Having used both models, I must say that using the provided gen_server helps me stay structured more easily. I guess this is why it is included in the OTP stack of tools: gen_server is a good way to get the repetitive boiler-plate out of the way.
If you have shared state over multiple processes you should probably go with gen_server and if the state is just local to one process a stateless module will do fine.
I suppose your invoices (or whatever they stand for) should be persistent, so they would end up in an ETS/Mnesia table anyway. If this is so, you should create a stateless module where you put your API for accessing the invoice table.

Resources