When a particular message is received by my gen_event manager process, I want it to stop after all handlers have handled it and before they get and handle any other events. The only way I could find is this:
-module(manager).
...
stop(Reason) ->
gen_event:sync_notify(manager, {stop, Reason}),
gen_event:stop(manager).
But this requires all handlers to return remove_handler from handle_event({stop, Reason}, State), otherwise they could handle an event sent from a different process after sync_notify and before stop. I would prefer to have an approach that imposes no requirements on handlers.
As far as I know, there is no other way to do it than the one you're using for handling in a way that is truly limited to one call, outside of just plainly killing the event manager with exit(Pid, Reason) or ordering it to be shut down by its own supervisor.
Related
I'm writing an Erlang application that requires actively polling some remote resources, and I want the process that does the polling to fit into the OTP supervision trees and support all the standard facilities like proper termination, hot code reloading, etc.
However, the two default behaviours, gen_server and gen_fsm seem to only support operation based on callbacks. I could abuse gen_server to do that through calls to self or abuse gen_fsm by having a single state that always loops to itself with a timeout 0, but I'm not sure that's safe (i.e. doesn't exhaust the stack or accumulate unread messages in the mailbox).
I could make my process into a special process and write all that handling myself, but that effectively makes me reimplement the Erlang equivalent of the wheel.
So is there a behavior for code like this?
loop(State) ->
do_stuff(State), % without waiting to be called
loop(NewState).
And if not, is there a safe way to trick default behaviours into doing this without exhausting the stack or accumulating messages over time or something?
The standard way of doing that in Erlang is by using erlang:send_after/3. See this SO answer and also this example implementation.
Is it possible that you could employ an essentially non OTP compliant process? Although to be a good OTP citizen, you do ideally want to make your long running processes into gen_server's and gen_fsm's, sometimes you have to look beyond the standard issue rule book and consider why the rules exist.
What if, for example, your supervisor starts your gen_server, and your gen_server spawns another process (lets call it the active_poll process), and they link to each other so that they have shared fate (if one dies the other dies). The active_poll process is now indirectly supervised by the supervisor that spawned the gen_server, because if it dies, so will the gen_server, and they will both get restarted. The only problem you really have to solve now is code upgrade, but this is not too difficult - your gen_server gets a code_change callback call when the code is to be upgraded, and it could simply send a message to the active_poll process, which can make an appropriate fully qualified function call, and bingo, it's running the new code.
If this doesn't suit you for some reason and/or you MUST use gen_server/gen_fsm/similar directly...
I'm not sure that writing a 'special process' really gives you very much. If you wrote a special process correctly, such that it is in theory compliant to OTP design principals, it could still be ineffective in practice if it blocks or busy waits in a loop somewhere, and doesn't invoke sys when it should, so you really have at most a small optimisation over using gen_server/gen_fsm with a zero timeout (or by having an async message handler which does the polling and sends a message to self to trigger the next poll).
If what ever you are doing to actively poll can block (such as a blocking socket read for example), this is really big trouble, as gen_server, gen_fsm or a special process will all be stopped from fullfilling their usual obligations (which they would usually be able to either because the callback in the case of gen_server/gen_fsm returns, or because receive is called and the sys module invoked explicitly in the case of a special process).
If what you are doing to actively poll is non blocking though, you can do it, but if you poll without any delay then it effectively becomes a busy wait (it's not quite because the loop will include a receive call somewhere, which means the process will yield, giving the scheduler voluntary opportunity to run other processes, but it's not far off, and it will still be a relative CPU hog). If you can have a 1ms delay between each poll that makes a world of difference vs polling as rapidly as you can. It's not ideal, but if you MUST, it'll work. So use a timeout (as big as you can without it becoming a problem), or have an async message handler which does the polling and sends a message to self to trigger the next poll.
I'm using OTP supervisor behaviour to supervise and restart child processes. However when the child dies I want to restart it with the same state it had before the crash.
If I write my own custom supervisor, I can just receive {EXIT,Pid,Reason} message and act upon it. When using OTP supervisor behaviour however it is all managed by OTP and I have no control over it. The only callback function I implement is init.
Is there any standard approach in case like this? How to customise the state of a child being restarted dynamically by the otp supervisor? How to get Pid of the terminating process using OTP? Or maybe its possible to get the state of the child just before termination, and then restore the child to the same state it had before it crashed?
Possibly restart with same state is not good idea. Probably wrong state lead process to crash and if you restart with same state, it will crash again. But if you want this, use external resource to keep it (like ets or mnesia).
Without knowing any details about what you are doing, I can imagine a world where the following makes sense:
the supervisor creates an ETS table and passes the table identifier to each child
a child process starts and, based on some relevant attribute of the child, consults the ETS table to look for state to load
every time a child's state changes it writes it to the ETS table
So, if I had 12 child processes representing the 12 Tribes of Cobol each would use its name as the key to the ETS table to look for state left behind by a previous incarnate upon starting. And each process would update the table (again using its name as the key) whenever its state changed.
The supervisor will automatically restart a killed child and step 2, above, would be executed in the child's init method. Step 3 would be dealt with in a child's handle_call, handle_cast and handle_info methods (I am making some assumptions about the nature of your processes). There are a number of restart strategies available via the supervisor that can even restart siblings if desired.
Hope this gives you some thoughts.
I think this sort of customizations of the OTP supervisor behaviour can't be done easily. The way OTP supervisors are designed forces me to follow some strict design practices. Most important one in this case is that supervisor shouldn't do anything else apart from monitoring its children and restarting them in case of abnormal termination. There should be no additional logic in the supervisor to not introduce any bugs in the supervisors which are critical part of supervision tree and fault tolerance.
when the child dies I want to restart it with the same state it had before the crash
- this is bad practice in general because child might've died because of the corrupted state it had before termination and restarting it with the same state in such case will surely cause problems
Is there any standard approach in case like this?
Customizing the state of the children within the supervisor, before restarting them acts against supervisor good design practices. Therefore this kind of tasks are usually done differently, for example by introducing another process, for example gen_server which would be responsible for starting children via supervisor (supervisor:start_child) and maintaining monitors on all processes. This additional process could do any required customizations before starting new child.
How to get Pid of the terminating process using OTP?
- in the additional process which starts children via supervisor:start_child you can monitor them and then listen to DOWN messages. For example in case of gen_server you would use handle_info function as below:
handle_info({'DOWN', Ref, process, _Pid, _}, S) ->
handle_down_worker(Ref, _Pid, S).
Or maybe its possible to get the state of the child just before termination, and then restore the child to the same state it had before it crashed?
- Correct me if I'm wrong but I think it is not possible in Erlang to send, along with the 'DOWN' message, the state of the process which child had, just before the termination. If that would be possible then I could just handle message similar to {DOWN, Pid, Reason, State} and restart the process with the same state or part of it. But then, I'm thinking.. How could you preserve the state of the suddenly dying child which was for example killed with exit(Pid, kill) ? I doubt that would be possible.
I want my FSM to terminate any time event doesn't come after specified amout of time in every state.
I can achieve such a scenario only in case there is no event after FSM creation by specifing timeout value in init callback, but I would like to have this functionality working for all of the states as well.
Any easy & quick solution?
Best Regards
Matt
You can set timeout in return tuple in each state {next_state, NextStateName, NewStateData, Timeout}. See gen_fsm documentation for more details. But it works only for case there is not any incoming messages in gen_fsm so it is suitable only if for example you would like terminate process when probably nobody is communicating with it. If you would like hard limits (for protocols for example) you should use erlang:send_after/3 or erlang:start_timer/3 and handle also timer termination and so.
I have a gen_server running which it must clean up its state whenever it is stopped normally or it crash unexpectedly. The cleanup basically consists in deleting a few files.
At this moment, when the gen_server crash or it is stopped normally, the cleanup is done in terminate/2.
Is there any reason why terminate/2 would not be called if the gen_server crash?
Should be any other process monitoring the gen_server waiting to do the cleanup if the gen_server dies unexpectedly?
So, the code is like this:
terminate(normal, State) ->
% Invoked when the process stops
% Clean up the mess
terminate(Error, State) ->
% Invoked when the process crashes
% Clean up the mess
EDIT: I found this email in the official mailing list which is talking about the same thing:
http://groups.google.com/group/erlang-programming/browse_thread/thread/9a1ba2d974775ce8
As Adam says below, if we want to avoid to trap the exists in the gen_server, we could use different approaches.
But if we trap the exists, terminate/2 seems to be a safe place to do the cleanup as it always will be called. Furthermore we must handle correctly when 'EXIT' is sent to terminate/2 and to handle_call/3 trying to propagate the errors correctly between workers and supervisors.
terminate/2 is called when a crash occur inside the gen_server even if it doesn't trap exits, it will not be called if it receives an 'EXIT' from some other process linked to it, in case you need to clean up then it should trap exits(using process_flag(trap_exit, true)).
This behavior is a bit unfortunate because it makes it difficult to write a reliable shutdown procedure for a gen_server process. Also, it is not a good habit to trap exits just for the sake of being able to run terminate/2, since you might catch a lot of other errors which makes it harder to debug the system.
I would consider three options:
Handle the left over files when the next instance of the process starts (for example, in init/1)
Trap exits, clean up the files, and then crash again with the same reason
Have a 3rd process which monitors the gen_server whose only purpose is to clean up the files
Option 1 is probably the best option, since at least the code doesn't trap exits and you get persistent state for free. Option 2 is not so nice for the reasons described above, that it can hide and obscure other errors. 3 is messy because the cleanup process might not be done before the gen_server is started again.
Think carefully about why you want to clean up, and if it really has to be done when the process crashes (it is a bug, after all). Be careful that you don't end up doing too much defensive programming.
This is quite fresh and relevant
When does terminate/2 get called in a gen_server?
I am creating a test app where is one supervisor with simple_one_for_one strategy and many worker children added dynamically to it. How to implement callback (or receive a message) in supervisor that will be called when child exit normally?
Main goal is to notify some other process that all supervised worker processes are done and it's time to show final report.
How to design such kind of behavior? Should I create my own behavior that combine supervisor and gen_server, or there is a way to do this with standard otp behaviors?
There are two ways to do such a notification. The first is to simply monitor the child from the beginning. By using erlang:monitor/2, a third party can whether a process is alive or not. When the monitored process dies, the result will be turned into a message that will give the reason for it to the monitoring process.
The other way could be to use a bit of message sending in the process' terminate/2 function (terminate/3 if it's a gen_fsm). This far more brittle because the terminate function will not be called in all circumstances.
The monitor option is far superior.