Data persistence when worker process dies, how? - erlang

I have worker processes that needs gathered/calculated data as arguments on start up. This is then needed on re-starts as well. Where should I put the initialization code? Inside the supervisors init? Or inside the modules start_link, or init? Are there any best practices in Erlang when it comes to this?

If the gen_server component has critical state, or state which cannot be re-calculated/re-gathered, I generally avoid keeping the state in gen_server itself. I instead choose to maintain state in an external process/ets table. If you are going by this approach, make sure the ets table is either created by an externel process (which you are sure will not die), for eg., the application process -or- create the ets table in the init method of the gen_server and use the "ets:give_away/3" method to hand it off to an external process (of course, you would need to check if the table is already created in the gen_server's init method).. Else the ets table will be destroyed when the process dies..

Related

Why can't simple_one_for_one specify different child id when call start_child?

From the doc:
Note that when the restart strategy is simple_one_for_one, the list of child specifications must be a list with one child specification only. (The child specification identifier is ignored.) No child process is then started during the initialization phase, but all children are assumed to be started dynamically using supervisor:start_child/2.
What's the design consideration for the part? It won't stop actively calling register(<chid_id>, ChildPid)
in each child process.
Registering names for the PID of a child process is nothing to do with supervisor.
Consider a bog standard supervisor (not dynamic), the child specification provides the supervisor with enough information to start the child, typically by calling child_module:start_link, but it is the implementation of child_module:start_link that determines how a process is started and possibly name registration. A typical child_module:start_link implementation would be something like:
start_link() ->
gen_server:start_link({local, server_name}, ?MODULE, [], []).
It's this call to gen_server:start_link/4 that causes the registration of the resultant gen_server processes PID with a name of 'server_name'.
You could call gen_server:start_link/3 instead, in which case the gen_server process would have no name, unless you call erlang:register/2 in your init/1 behaviour implementation or something like that.
This is good because there's no reason to couple name registration with supervision, the name of a module/process is about that module and it's service, how it's accessed and used, not about supervision strategy.
It is quite common for supervised processes to register names for themselves, thus becoming named services that any other process can access easily.
For simple one to one supervision however, typically supervised children would not have names, because they should be homogeneous (i.e. don't create a simple one to one supervisor that starts various workers dynamically that all do different things, if they do different things then they almost certainly have different relative importance and they should be under different supervisors), and as such, unique names are not useful / appropriate.
The reason then that you can't choose a different child identifier when calling start_child, is that the child identifier is really the ID of the child specification (i.e. the type of child) not the ID of the child process in any way as such. Using a different child identifier would be saying 'this is a different type of process that does something different to the other one'. This fits with the requirement that the child specifications is a list of one.

How does -on_load work in erlang?

I seem to not fully understand, how the -on_load directive works. In a module i have written i have a function to initialize an ets table and populate it with some data. This function works properly when calling it explicitly. However: i thought it would be nice, if the ets table would be populated "automatically" when the module is loaded. But this does not seem to work, because ets:info(filesig) tells me "undefined" after loading the module. The relevant code looks something like:
...
-on_load(init/0),
init() ->
% load filesig database into ETS
{_, Signatures} = file:consult("path to a file"),
ets:new(filesig, [set, protected, named_table]),
ets:insert(filesig, Signatures),
ok.
...
I've tested it from within the erlang shell. Any hints for me, what i am doing wrong?
The manual says that this code runs in a newly spawned process which terminates as soon as the function returns.
The ETS table you create gets deleted once the owning process terminates. This is a standard ETS behavior. Here's what the ets man page mentions about it:
Note that there is no automatic garbage collection for tables. Even if
there are no references to a table from any process, it will not
automatically be destroyed unless the owner process terminates. It can
be destroyed explicitly by using delete/1. The default owner is the
process that created the table. Table ownership can be transferred at
process termination by using the heir option or explicitly by calling
give_away/3.

how to handle termination of gen_fsm

I have a MAIN process that spawn an implementation of a gen_fsm behavior, but this MAIN process is not an implementation of supervisor behavior, its just another module.
Let say the implementation of gen_fsm is called GAME_ROOM.
My case is like this:
When ever there are 3 peoples ready, the MAIN process will spawn a new GAME_ROOM.
I use gen_fsm:start_link function to initiate a new GAME_ROOM, so if the GAME_ROOM exit by error, my MAIN process could spawn a new one, to replace the downed process.
I managed to make my MAIN process detect the EXIT event of all downed GAME_ROOM
The problem is: I need to restore all downed GAME_ROOM states at the new one.
My question is: How can I use gen_fsm's terminate function to pass the latest states of the gen_fsm to my MAIN process, so when I respawn a new GAME_ROOM, I can pass that states?
One simple way would be for GAME_ROOM terminate/3 to send a message with the necessary state information to MAIN. For this to work the GAME_ROOM must know the pid of MAIN (easy) and you have to be certain that terminate/3 is really called.
Read about process_flag ({trap_exit, true}) and handle info 'EXIT'.
First of all, I would really suggest you to look at using supervisors in your implementation, to avoid re-inventing the wheel.
A possibility could be to create an ETS table in your MAIN, so you can store data from within your gen_fsms which can survive process crashes.
My belief is that if GAME_ROOM exits because of an error, there is nothing to save (how do you know your state is valid, otherwise you would trap the error inside GAME_ROOM).

Registering a child in the process that initiated the start_child call

I have a logic module that tells a supervisor to start child processes. I need to store those childrens pid in the logic modules state. But I also need to update a childs pid if the supervisor restarts it.
So I can't use the return value pid from the start_child call, since that will only give me the pid on the first start, not the restarts. Right now I make the child process call a register function (updates state with new pid) in the logic module from the childs init function. That way the logic module can update the pid in its state whenever a process is restarted. The logic module is a gen_server and I'm doing a cast when i register the child process.
Can anyone see a problem with this and are there any other more "proper" way of doing it?
One problem is that you have the ChildPid and the child might be dead by now. So sending it a message through a cast will mean the message is lost. And through a call you will crash yourself with an {'EXIT', noproc} unless you catch it out of the call. Your solution must take into account that a Pid might be long gone the instant you send a message. Usually by ignoring that the message is lost, by crashing yourself, or by remedying the problem and then go on.
There are a couple of options. This is a loose list:
Do as you do. Let the childs register themselves.
Let the logic module have a monitor on the child. That way you know if it dies.
Use Erlang Solutions gproc module: https://github.com/esl/gproc which gives you a neat interface to an ETS table keeping track of the information. Note that you can look up a pid in gproc and await its arrival if the process is just restarting.
Use supervisor:which_children to find the relevant child.
Roll your own ETS table as a variant of gproc
local names have to be atoms, but globally registered names can be any term (they are stored internally in a ETS table looking somewhat like gproc's, see the global_name_server in kernel/stdlib). Use the global structure to track the pids in question.

How Do You Determine The PID of the Parent of a Process

I have a process in erlang that is supposed to do something immediately after spawn, then send the result back to the parent when it is finished. How do I figure out the PID of the process that spawned it?
You should pass self() to the child as one of the arguments to the entry function.
spawn_link(?MODULE, child, [self()]).
#Eridius' answer is the preferred way to do it. Requiring a process to register a name may have unintended side-effects such as increasing the visibility of the process not to mention the hassle of coming up with unique names when you have lots of processes.
The best way is definitely to pass it as an argument to the function called to start the child process. If you are spawning funs, which generally is a Good Thing to do, be careful of doing:
spawn_link(fun () -> child(self()) end)
which will NOT do as you intended. (Hint: when is self() called)
Generally you should avoid registering a process, i.e. giving it a global name, unless you really want it to be globally known. Spawning a fun means that you don't have to export the spawned function as you should generally avoid exporting functions that aren't meant to be called from other modules.
You can use the BIF register to give the spawning / parent process a name (an atom) then refer back to the registered name from other processes.
FUNC() ->
%% Do something
%% Then send message to parent
parent ! MESSAGE.
...
register(parent, self()),
spawn(MODULE, FUNC, [ARGS]).
See Getting Started With Erlang §3.3 and The Erlang Reference Manual §10.3.

Resources