I've set up a simple test-case at https://github.com/bvdeenen/otp_super_nukes_all that shows that an otp application:stop() actually kills all spawned processes by its children, even the ones that are not linked.
The test-case consists of one gen_server (registered as par) spawning a plain erlang process (registered as par_worker) and a gen_server (registered as reg_child), which also spawns a plain erlang process (registered as child_worker). Calling application:stop(test_app) does a normal termination on the 'par' gen_server, but an exit(kill) on all others!
Is this nominal behaviour? If so, where is it documented, and can I disable it? I want the processes I spawn from my gen_server (not link), to stay alive when the application terminates.
Thanks
Bart van Deenen
The application manual says (for the stop/1 function):
Last, the application master itself terminates. Note that all processes with the
application master as group leader, i.e. processes spawned from a process belonging
to the application, thus are terminated as well.
So I guess you cant modify this behavior.
EDIT: You might be able to change the group_leader of the started process with group_leader(GroupLeader, Pid) -> true (see: http://www.erlang.org/doc/man/erlang.html#group_leader-2). Changing the group_leader might allow you to avoid killing your process when the application ends.
I made that mistakes too, and found out it must happen.
If parent process dies, all children process dies no matter what it is registered or not.
If this does not happen, we have to track all up-and-running processes and figure out which is orphaned and which is not. you can guess how difficult it would be. You can think of unix ppid and pid. if you kill ppid, all children dies too. This, I think this must happen.
If you want to have processes independent from your application, you can send a messageto other application to start processes.
other_application_module:start_process(ProcessInfo).
Related
Problem: I have a type FlyServer and need to iterate through all the Fly processes. For various computations on the server.
How do I accomplish this?
One option is to have a GenServer list of all the FlyServer processes. But what if it crashes? And what if a player crashes and for whatever reason the GenServer keeping track of the processes isn't notified --- chime in if that scenario is unrealistic please.
I advise you to start your servers using a supervisor with a call to supervisor:start_child/2. The supervisor should use the strategy simple_one_for_one which is meant to create and supervise processes of the same kind.
Then you can get an updated list of all the chidren using the function supervisor:which_children/1
Every time a Fly process contacts the server, you can add its pid to a list, where the list is part of the gen_server's State.
The server can then monitor the Fly process, which means that when a Fly process terminates, the server will get sent a special message.
The server can implement a receive clause that pattern matches the special message and then removes the terminated process's pid from the list.
One option is to have a GenServer list of all the FlyServer processes.
But what if it crashes?
Then terminate(Reason, State) will be called in the callback module, which can save State to an ets, dets, or mnesia table. Of course, if someone trips over the cord that connects the server running the FlyServer to an electrical outlet, then execution will immediately halt and terminate() will not be called. See distributed erlang for solutions.
Can someone explain what's the difference between gen_server:start() and gen_server:start_link()?
I've been told that it's something about multi threading stuff.
EDIT:
If my gen_server is called from multiple threads, will it execute them all at once ? Or will it create concurrency between these threads?
Both functions start new gen_server instances as children of the calling process, but they differ in that the gen_server:start_link/3,4 atomically starts a gen_server child and links it to its parent process. Linking means that if the child dies, the parent will by default also die. Supervisors are parent processes that use links to take specific actions when their child processes exit abnormally, typically restarting them.
Other than the linking involved in the gen_server:start_link case, there are no multi-process aspects involved in these calls. Regardless of whether you use gen_server:start or gen_server:start_link to start a new gen_server, the new process has a single message queue, and it receives and processes those messages one at a time. There is nothing about gen_server:start_link that causes the new gen_server process to behave or perform differently than it would if started with gen_server:start.
When you use gen_server:start_link new process becomes "child" of calling process - it's part of supervision tree. It allows calling process to be notified if gen_server process dies.
Using gen_server:start will spawn process outside of supervision tree.
Nice description of supervision in Erlang is here: http://learnyousomeerlang.com/supervisors
I have several gen_server workers periodically requesting some information from hardware sensors. Sensors may temporary fail, it is normal. If sensor fails worker terminates with an exception.
All workers are spawned form supervisor with simple_one_to_one strategy. Also I have a control gen_server, which can start and stop workers and also recives 'DOWN' messages.
So now I have two problems:
If worker is restarted by supervisor its state is lost, which is not acceptable to me. I need to recreate the worker with the same state.
If the worker is failing several times in period of time something serious has happened with the sensors and it requires the operator's attention. Thus I need to give up restarting the worker and send a message to event handlers. But the default behaviuor of supervisor is terminate after exhaust process restart limit.
I see two solutions:
Set the type of the processes in the supervisor as temporary and control them and restart them in control gen_server. But this is exactly what supervisor should do, so I'm reinventing the wheel.
Create a supervisor for each worker under the main supervisor. This exactly solves my second problem, but the state of workers is lost after restart, thus I need some storage like ets table storing the states of workers.
I am very new to Erlang, so I need some advice to my problem, as to which (if any) solution is the best. Thanks in advance.
If worker is restarted by supervisor its state is lost, which is not
accertable to me. I need to recreate worker with the same state.
If you need the process state to persist the process lifecycle, you need to store it elsewhere, for example in an ETS table.
If the worker is failing several times in particular amount of time
something serious happened with sensors and it require operator's
attention. Thus I need to give up restarting worker and send some
message for event handlers. But default behaviuor of supervisor is
terminate after exhaust process restart limit.
Correct. Generally speaking, the less logic you put into your supervisor, the better it is. Supervisors should just supervise child processes and that's it. But you could still monitor your supervisor and be notified whenever your supervisor gave up (just an idea). This way you can avoid re-invent the wheel and use the supervisor to manage the children.
In all Erlang supervisor examples I have seen yet, there usually is a "master" supervisor who supervises the whole tree (or at least is the root node in the supervisor tree). What if the "master"-supervisor breaks? How should the "master"-supervisor be supervised?? any typical pattern?
The top supervisor is started in your application start/2 callback using start_link, this means that it links with the application process. If the application process receives an exit signal from the top supervisor dying it does one of two things:
If the application is started as an permanent application the entire node i terminated (and maybe restarted using HEART).
If the application is started as temporary the application stops running, no restart attempts will be made.
Typically Supervisor is set to "only" supervise other processes. Which mens there is no user written code which is executed by Supervisor - so it very unlikely to crash.
Of course, this cannot be enforced ... So typical pattern is to not have any application specific logic in Supervisor ... It should only Supervise - and do nothing else.
Good question. I have to concur that all of the examples and tutorials mostly ignore the issue - even if occasionally someone mentions the issue (without providing an example solution):
If you want reliability, use at least two computers, and then make them supervise each other. How to actually implement that with OTP is (with the current state of documentation and tutorials), however, appears to be somewhere between well hidden and secret.
in Erlang I have a supervisor-tree of processes, containing one that accepts tcp/ip connections. For each incoming connection I spawn a new process. Should this process be added to the supervisor tree or not?
Regards,
Steve
Yes, you should add these processes to the supervision heirarchy as you want them to be correctly/gracefully shutdown when your application is stopped. (Otherwise you end up leaking connections that will fail as the application infrastructure they depend on been shutdown).
You could create a simple_one_for_one strategy supervisor say yourapp_client_sup that has a child spec of {Id, {yourapp_client_connection, start_link_with_socket, []}, Restart, Shutdown, worker, temporary}. The temporary type here is important because there's normally no useful restart strategy for a connection handler - you can't connect out to the client to restart the connection. temporary here will cause the supervisor to report the connection handler exit but otherwise ignore it.
The process that does gen_tcp:accept will then create the connection handler process by doing supervisor:start_child(yourapp_client_sup, [Socket,Options,...]) rather than yourapp_client_sup:start_link(Socket, Options, ...). Ensure that the youreapp_client_connection:start_link_with_socket function starts the child via gen_server or proc_lib functions (a requirement of the supervisor module) and that the function transfers control of the socket to the child with gen_tcp:controlling_process otherwise the child won't be able to use the socket.
An alternate approach is to create a dummy yourapp_client_sup process that yourclient_connection_handler processes can link to at startup. The yourapp_client_sup process will just exist to propagate EXIT messages from its parent to the connection handler processes. It will need to trap exists and ignore all EXIT messages other than those from its parent. On the whole, I prefer to use the simple_one_for_one supervisor approach.
If you expect these processes to be many, it could be a good idea to add a supervisor under your main supervisor as to separate responsibility (and maybe use the simple_one_for_one setting to make things simpler, maybe even simpler than your current case).
The thing is, if you need to control these processes, it's always nice to have a supervisor. If it doesn't matter if they succeed or not, then you might not need one. But then again, I always argue that that is sloppy coding. ;-)
The only thing I wouldn't do, is to add them to your existing tree, unless it is very obvious where they come from and they're fairly few.