I'm failing to see how adding a supervisor to a crashing gen_tcp:listen-thread would actually restart that worker. Since crashing would render the port I wanna listen on useless for a brief moment. When a crash occur and I'm trying to manually restart my application I receive "{error,eaddrinuse}". I haven't implemented any supervisor for this worker yet since I fail to see how it would work.
How do I restart a gen_tcp:listen?
In most cases, as the listen socket is linked to the controlling process (the process that created it), a termination of this process will close the socket nicely and allow you to listen again on the same port.
For all other cases, you should pass {reuseaddr, true} option to gen_tcp:listen/2. Indeed, the listen socket of your application remains active for a brief moment after a crash, and this option allows you to reuse the address during this period.
Is the process managing the gen_tcp socket a gen_server? If so, it will make your life easier.
If it is a gen_server, add process_flag(trap_exit, true) to your init function. This makes it so that when your process "crashes", it calls the terminate/2 callback function prior to actually exiting the process. Using this method, you can manually close your listening socket in the terminate function, thereby avoiding the irritating port cleanup delay.
If you are not using a gen_server, the same principle still applies, but you must be much more explicit about catching your errors.
Related
I'm writing a TCP server in Erlang, using Ranch. The clients will reconnect immediately the connection is dropped, which means that one particular failure mode is listeners being started and killed dozens of times a second.
I'd like to detect this happening and publish statistics to statsd, for monitoring in production.
So, can I use something in Ranch to monitor when a listener is recycled? Or can I use something in Erlang to monitor process mortality for the entire node, without having to link to each process, and where those processes were started by some other supervisor, so I don't have a reference to them?
It's not a direct answer to my question, but I opted, instead, to have a separate process periodically polling ranch_server:count_connections(my_ref), and publish that to statsd.
I am just reading Manning's Erlang & OTP In Action. Very good book, I think. It contains a nice TCP server example but I'd like to write a UDP server. This is how I structured my app so far.
my_app % app behaviour
|-- my_sup % root supervisor
|-- my_server.erl % gen_server to open UDP connection and dispatch
|-- my_worker_sup % simple_one_to_one supervisor to start workers
|-- my_worker_server % gen_server worker
So, my_app starts my_sup, which in turn starts my_worker_sup and my_server. The UDP connection is opened in my_server in active mode such that handle_info/2 is invoked on each new UDP message in response to which I call my_worker_sup:start_child/2 to pass the message to a new worker process for processing. (The last call to start_child/2 is in fact, as per the book's recommendation, wrapped in an API function to hide some of the details, but this is essentially what happens.)
Am I suffering from OTP fever? Should the my_worker_server really implement the gen_server behaviour? Do I need my_worker_sup at all?
I set it up in like this so that I can use my_worker_sup as a factory via the start_child/2 call but I only use the worker's init/1 and handle_info(timeout,State) functions to first setup state and then to process the message before shutting the worker down.
Should I just spawn the worker directly? Is another behaviour better suited, perhaps?
Thanks,
HC
The key answer to this question is: "how do you want your application to crash?"
If a worker dies, then what should happen? If this should stop everything, including the UDP connection, then surely you can just spawn_link them under the my_server directly, no supervisor tree needed. But if you want them to be able to gracefully restart or something else, then the above diagram is usually better. Perhaps add a monitor on the workers from my_server so it can keep a book of who is alive.
In my utp erlang library, I have almost the same construction. A master handles the UDP socket and forwards to workers based on a routing table kept in ETS. Each worker keeps a connection state and can handle the incoming information.
Since you don't track state, then your best bet is probably to run via proc_lib:spawn_link and then hook them to the s_1_1 supervisor as transient processes. That way, you will force too many crashes to be propagated up the supervisor tree but allow them to exit with normal. This allows you to have them run exactly once.
Note that you could also handle everything directly in the my_server, but then you will not be able to process data concurrently. This may or may not be acceptable. The general rule is to spawn a new process when you have concurrent work that needs to be executed next to each other, blocks or otherwise behaves in some way.
I'm still kind of new to the erlang/otp world, so I guess this is a pretty basic question. Nevertheless I'd like to know what's the correct way of doing the following.
Currently, I have an application with a top supervisor. The latter will supervise workers that call gen_tcp:accept (sleeping on it) and then spawn a process for each accepted connection. Note: To this question, it is irrelevant where the listen() is done.
My question is about the correct way of making these workers (the ones that sleep on gen_tcp:accept) respect the otp design principles, in such a way that they can handle system messages (to handle shutdown, trace, etc), according to what I've read here: http://www.erlang.org/doc/design_principles/spec_proc.html
So,
Is it possible to use one of the available behaviors like gen_fsm or gen_server for this? My guess would be no, because of the blocking call to gen_tcp:accept/1. Is it still possible to do it by specifying an accept timeout? If so, where should I put the accept() call?
Or should I code it from scratch (i.e: not using an existant behavior) like the examples in the above link? In this case, I thought about a main loop that calls gen_tcp:accept/2 instead of gen_tcp:accept/1 (i.e: specifying a timeout), and immediately afterwards code a receive block, so I can process the system messages. Is this correct/acceptable?
Thanks in advance :)
As Erlang is event driven, it is awkward to deal with code that blocks as accept/{1,2} does.
Personally, I would have a supervisor which has a gen_server for the listener, and another supervisor for the accept workers.
Handroll an accept worker to timeout (gen_tcp:accept/2), effectively polling, (the awkward part) rather than receiving an message for status.
This way, if a worker dies, it gets restarted by the supervisor above it.
If the listener dies, it restarts, but not before restarting the worker tree and supervisor that depended on that listener.
Of course, if the top supervisor dies, it gets restarted.
However, if you supervisor:terminate_child/2 on the tree, then you can effectively disable the listener and all acceptors for that socket. Later, supervisor:restart_child/2 can restart the whole listener+acceptor worker pool.
If you want an app to manage this for you, cowboy implements the above. Although http oriented, it easily supports a custom handler for whatever protocol to be used instead.
I've actually found the answer in another question: Non-blocking TCP server using OTP principles and here http://20bits.com/article/erlang-a-generalized-tcp-server
EDIT: The specific answer that was helpful to me was: https://stackoverflow.com/a/6513913/727142
You can make it as a gen_server similar to this one: https://github.com/alinpopa/qerl/blob/master/src/qerl_conn_listener.erl.
As you can see, this process is doing tcp accept and processing other messages (e.g. stop(Pid) -> gen_server:cast(Pid,{close}).)
HTH,
Alin
I've set up a simple test-case at https://github.com/bvdeenen/otp_super_nukes_all that shows that an otp application:stop() actually kills all spawned processes by its children, even the ones that are not linked.
The test-case consists of one gen_server (registered as par) spawning a plain erlang process (registered as par_worker) and a gen_server (registered as reg_child), which also spawns a plain erlang process (registered as child_worker). Calling application:stop(test_app) does a normal termination on the 'par' gen_server, but an exit(kill) on all others!
Is this nominal behaviour? If so, where is it documented, and can I disable it? I want the processes I spawn from my gen_server (not link), to stay alive when the application terminates.
Thanks
Bart van Deenen
The application manual says (for the stop/1 function):
Last, the application master itself terminates. Note that all processes with the
application master as group leader, i.e. processes spawned from a process belonging
to the application, thus are terminated as well.
So I guess you cant modify this behavior.
EDIT: You might be able to change the group_leader of the started process with group_leader(GroupLeader, Pid) -> true (see: http://www.erlang.org/doc/man/erlang.html#group_leader-2). Changing the group_leader might allow you to avoid killing your process when the application ends.
I made that mistakes too, and found out it must happen.
If parent process dies, all children process dies no matter what it is registered or not.
If this does not happen, we have to track all up-and-running processes and figure out which is orphaned and which is not. you can guess how difficult it would be. You can think of unix ppid and pid. if you kill ppid, all children dies too. This, I think this must happen.
If you want to have processes independent from your application, you can send a messageto other application to start processes.
other_application_module:start_process(ProcessInfo).
in Erlang I have a supervisor-tree of processes, containing one that accepts tcp/ip connections. For each incoming connection I spawn a new process. Should this process be added to the supervisor tree or not?
Regards,
Steve
Yes, you should add these processes to the supervision heirarchy as you want them to be correctly/gracefully shutdown when your application is stopped. (Otherwise you end up leaking connections that will fail as the application infrastructure they depend on been shutdown).
You could create a simple_one_for_one strategy supervisor say yourapp_client_sup that has a child spec of {Id, {yourapp_client_connection, start_link_with_socket, []}, Restart, Shutdown, worker, temporary}. The temporary type here is important because there's normally no useful restart strategy for a connection handler - you can't connect out to the client to restart the connection. temporary here will cause the supervisor to report the connection handler exit but otherwise ignore it.
The process that does gen_tcp:accept will then create the connection handler process by doing supervisor:start_child(yourapp_client_sup, [Socket,Options,...]) rather than yourapp_client_sup:start_link(Socket, Options, ...). Ensure that the youreapp_client_connection:start_link_with_socket function starts the child via gen_server or proc_lib functions (a requirement of the supervisor module) and that the function transfers control of the socket to the child with gen_tcp:controlling_process otherwise the child won't be able to use the socket.
An alternate approach is to create a dummy yourapp_client_sup process that yourclient_connection_handler processes can link to at startup. The yourapp_client_sup process will just exist to propagate EXIT messages from its parent to the connection handler processes. It will need to trap exists and ignore all EXIT messages other than those from its parent. On the whole, I prefer to use the simple_one_for_one supervisor approach.
If you expect these processes to be many, it could be a good idea to add a supervisor under your main supervisor as to separate responsibility (and maybe use the simple_one_for_one setting to make things simpler, maybe even simpler than your current case).
The thing is, if you need to control these processes, it's always nice to have a supervisor. If it doesn't matter if they succeed or not, then you might not need one. But then again, I always argue that that is sloppy coding. ;-)
The only thing I wouldn't do, is to add them to your existing tree, unless it is very obvious where they come from and they're fairly few.