Erlang: Best way for a singleton gen_server in erlang cluster? - erlang

Setting:
I want to start a unique global registered gen_server process in an erlang cluster. If the process is stopped or the node running it goes down, the process is to be started on one of the other nodes.
The process is part of a supervisor. The problem is that starting the supervisor on a second node fails because the gen_server is already running and registerd globally from the first node.
Question(s):
Is it ok to check if the process is already globally registered inside gen_server's start_link function and in this case return {ok, Pid} of the already running process instead of launching a new gen_server instance?
Is it correct, that this way the one process would be part of multiple supervisors and if the one process goes down all supervisors on all other nodes would try to restart the process. The first supervisor would create a new gen_server process and the other supervisors would all link to that one process again.
Should I use some sort of global:trans() inside the gen_server's start_link function?
Example Code:
start_link() ->
global:trans({?MODULE, ?MODULE}, fun() ->
case gen_server:start_link({global, ?MODULE}, ?MODULE, [], []) of
{ok, Pid} ->
{ok, Pid};
{error, {already_started, Pid}} ->
link(Pid),
{ok, Pid};
Else -> Else
end
end).

If you return {ok, Pid} of something you don't link to it will confuse a supervisor that relies on the return value. If you're not going to have a supervisor use this as a start_link function you can get away with it.
Your approach seems like it should work as each node will try to start a new instance if the global one dies. You may find that you need to increase the MaxR value in your supervisor setup as you'll get process messages every time the member of the cluster changes.
One way I've created global singletons in the past is to run the process on all the nodes, but have one of them (the one that wins the global registration race) be the master. The other processes monitor the master and when the master exits, try to become the master. (And again, if they don't win the registration race then they monitor the pid of the one that did). If you do this, you have to handle the global name registration yourself (i.e. don't use the gen_server:start({global, ... functionality) because you want the process to start whether or not it wins the registration, it will simply behave differently in each case.
The process itself must be more complicated (it has to run in both master and non-master modes), but it stabilizes quickly and doesn't produce a lot of log spam with supervisor start attempts.
My method usually requires a few rounds of revision to shake out the corner cases, but is to my mind less hassle than writing an OTP Distributed Application. This method has another advantage over distributed applications in that you don't have to statically configure the list of nodes involved in your cluster - any node can be a candidate for running the master copy of the process. Your approach has this same property.

How about turning the gen_server into an application and using distributed applications?

Related

Can an OTP supervisor monitor a process on a remote node?

I'd like to use erlang's OTP supervisor in a distributed application I'm building. But I'm having trouble figuring out how a supervisor of this kind can monitor a process running on a remote Node. Unlike erlang's start_link function, start_child has no parameters for specifying the Node on which the child will be spawned.
Is it possible for a OTP supervisor to monitor a remote child, and if not, how can I achieve this in erlang?
supervisor:start_child/2 can be used across nodes.
The reason for your confusion is just a mix-up regarding the context of execution (which is admittedly a bit hard to keep straight sometimes). There are three processes involved in any OTP spawn:
The Requestor
The Supervisor
The Spawned Process
The context of the Requestor is the one in which supervisor:start_child/2 is called -- not the context of the supervisor itself. You would normally provide a supervisor interface by exporting a function that wraps the call to supervisor:spawn_child/2:
do_some_crashable_work(Data) ->
supervisor:start_child(sooper_dooper_sup, [Data]).
That might be defined and exported from the supervisor module, be defined internally within a "manager" sort of process according to the "service manager/supervisor/workers" idiom, or whatever. In all cases, though, some process other than the supervisor is making this call.
Now look carefully at the Erlang docs for supervisor:start_child/2 again (here, and an R19.1 doc mirror since sometimes erlang.org has a hard time for some reason). Note that the type sup_ref() can be a registered name, a pid(), a {global, Name} or a {Name, Node}. The requestor may be on any node calling a supervisor on any other node when calling with a pid(), {global, Name} or a {Name, Node} tuple.
The supervisor doesn't just randomly kick things off, though. It has a child_spec() it is going off of, and the spec tells the supervisor what to call to start that new process. This first call into the child module is made in the context of the supervisor and is a custom function. Though we typically name it something like start_link/N, it can do whatever we want as a part of startup, including declare a specific node on which to spawn. So now we wind up with something like this:
%% Usually defined in the requestor or supervisor module
do_some_crashable_work(SupNode, WorkerNode, Data) ->
supervisor:start_child({sooper_dooper_sup, SupNode}, [WorkerNode, Data]).
With a child spec of something like:
%% Usually in the supervisor code
SooperWorker = {sooper_worker,
{sooper_worker, start_link, []},
temporary,
brutal_kill,
worker,
[sooper_worker]},
Which indicates that the first call would be to sooper_worker:start_link/2:
%% The exported start_link function in the worker module
%% Called in the context of the supervisor
start_link(Node, Data) ->
Pid = proc_lib:spawn_link(Node, ?MODULE, init, [self(), Data]).
%% The first thing the newly spawned process will execute
%% in its own context, assuming here it is going to be a gen_server.
init(Parent, Data) ->
Debug = sys:debug_options([]),
{ok, State} = initialize_some_state(Data)
gen_server:enter_loop(Parent, Debug, State).
You might be wondering what all that mucking about with proc_lib was for. It turns out that while calling for a spawn from anywhere within a multi-node system to initiate a spawn anywhere else within a multi-node system is possible, it just isn't a very useful way of doing business, and so the gen_* behaviors and even proc_lib:start_link/N doesn't have a method of declaring the node on which to spawn a new process.
What you ideally want is nodes that know how to initialize themselves and join the cluster once they are running. Whatever services your system provides are usually best replicated on the other nodes within the cluster, and then you only have to write a way of picking a node, which allows you to curry out the business of startup entirely as it is now node-local in every case. In this case whatever your ordinary manager/supervisor/worker code does doesn't have to change -- stuff just happens, and it doesn't matter that the requestor's PID happens to be on another node, even if that PID is the address to which results must be returned.
Stated another way, we don't really want to spawn workers on arbitrary nodes, what we really want to do is step up to a higher level and request that some work get done by another node and not really care about how that happens. Remember, to spawn a particular function based on an {M,F,A} call the node you are calling must have access to the target module and function -- if it has a copy of the code already, why isn't it a duplicate of the calling node?
Hopefully this answer explained more than it confused.

Erlang: Is it ok to write application without a supervisor?

I don't need a supervisor for some specific application I develop. Is it ok to not to use one?
The doc says about the start/2 that
"should return {ok,Pid} or {ok,Pid,State} where Pid is the pid of the
top supervision"
so I'm not sure if it is OK not to start a supervisor and to return some invalid pid (I tried and nothing bad happened)
Returning an {ok, self()} or something similar works fine until you start doing release upgrades. At that point, you'll need to use a supervisor with an empty child list. (The application and supervisor behaviours don't have colliding callback functions, so you can put both in the same module.)
Just to make sure: you are doing some kind of initialisation in your application module's start callback function, right? If not, you can just remove the mod directive from the .app file and the callback won't even be called, and thus there will be no supervisor, real or fake.

Is it bad to send a message to self() in init?

In this example, the author avoids a deadlock situation by doing:
self() ! {start_worker_supervisor, Sup, MFA}
in his gen_server's init function. I did something similar in one of my projects and was told this method was frowned upon, and that it was better to cause an immediate timeout instead. What is the accepted pattern?
Update for Erlang 19+
Consider using the new gen_statem behaviour. This behaviour supports generating of events internal to the FSM:
The state function can insert events using the action() next_event and such an event is inserted as the next to present to the state function. That is, as if it is the oldest incoming event. A dedicated event_type() internal can be used for such events making them impossible to mistake for external events.
Inserting an event replaces the trick of calling your own state handling functions that you often would have to resort to in, for example, gen_fsm to force processing an inserted event before others.
Using the action functionality in that module, you can ensure your event is generated in init and always handled before any external events, specifically by creating a next_event action in your init function.
Example:
...
callback_mode() -> state_functions.
init(_Args) ->
{ok, my_state, #data{}, [{next_event, internal, do_the_thing}]}
my_state(internal, do_the_thing, Data) ->
the_thing(),
{keep_state, Data);
my_state({call, From}, Call, Data) ->
...
...
Old answer
When designing a gen_server you generally have the choice to perform actions in three different states:
When starting up, in init/1
When running, in any handle_* function
When stopping, in terminate/2
A good rule of thumb is to execute things in the handling functions when acting upon an event (call, cast, message etc). The stuff that gets executed in init should not wait for events, that's what the handle callbacks are for.
So, in this particular case, a kind of "fake" event is generated. I'd say it seems that the gen_server always wants to initiate the starting of the supervisor. Why not just do it directly in init/1? Is there really a requirement to be able to handle another message in-between (the effect of doing it in handle_info/2 instead)? That windown is so incredibly small (the time between start of the gen_server and the sending of the message to self()) so it's highly unlikely to happen at all.
As for the deadlock, I would really advise against calling your own supervisor in your init function. That's just bad practice. A good design pattern for starting worker process would be one top level supervisor, with a manager and a worker supervisor beneath. The manager starts workers by calling the worker supervisor:
[top_sup]
| \
| \
| \
man [work_sup]
/ | \
/ | \
/ | \
w1 ... wN
Just to complement what has already been said about splitting a servers initialisation into two parts, the first in the init/1 function and the second in either handle_cast/2 or handle_info/2. There is really only one reason to do this and that is if the initialisation is expected to take a long time. Then splitting it up will allow the gen_server:start_link to return faster which can be important for servers started by supervisors as they "hang" while starting their children and one slow starting child can delay the whole supervisor startup.
In this case I don't think it is bad style to split the server initialisation.
It is important to be careful with errors. An error in init/1 will cause the supervisor to terminate while an error in the second part as they will cause the supervisor to try and restart that child.
I personally think it is better style for the server to send a message to itself, either with an explicit ! or a gen_server:cast, as with a good descriptive message, for example init_phase_2, it will be easier to see what is going on, rather than a more anonymous timeout. Especially if timeouts are used elsewhere as well.
Calling your own supervisor sure does seem like a bad idea, but I do something similar all the time.
init(...) ->
gen_server:cast(self(), startup),
{ok, ...}.
handle_cast(startup, State) ->
slow_initialisation_task_reading_from_disk_fetching_data_from_network_etc(),
{noreply, State}.
I think this is clearer than using timeout and handle_info, it's pretty much guaranteed that no message can get ahead of the startup message (no one else has our pid until after we've sent that message), and it doesn't get in the way if I need to use timeouts for something else.
This may be very efficient and simple solution, but I think it is not good erlang style.
I am using timer:apply_after, which is better and does not make impression of interacting with external module/gen_*.
I think that the best way would be to use state machines (gen_fsm). Most of our gen_srvers are really state machine, however because initial work effort to set up get_fsm I think we end up with gen_srv.
To conclude, I would use timer:apply_after to make code clear and efficient or gen_fsm to be pure Erlang style (even faster).
I have just read code snippets, but example itself is somehow broken -- I do not understand this construct of gen_srv manipulating supervisor. Even if it is manager of some pool of future children, this is even more important reason to do it explicitly, without counting on processes' mailbox magic. Debugging this would be also hell in some bigger system.
Frankly, I don't see a point in splitting initialization. Doing heavy lifting in init does hang supervisor, but using timeout/handle_info, sending message to self() or adding init_check to every handler (another possibility, not very convenient though) will effectively hang calling processes. So why do I need "working" supervisor with "not quite working" gen_server? Clean implementation should probably include "not_ready" reply for any message during initialization (why not to spawn full initialization from init + send message back to self() when complete, which would reset "not_ready" status), but then "not ready" reply should be properly processed by the caller and this adds a lot of complexity. Just suspending a reply is not a good idea.

How do you parameterize a gen_server module?

EDIT:
I'm not looking to use parameters as a general purpose way to construct Erlang programs--I'm still learning the traditional design principles. I'm also not looking to emulate OOP. My only point here is to make my gen_server calls consistent across server instances. This seems more like fixing a broken abstraction to me. I can imagine a world where the language or OTP made it convenient to use any gen_server instance's api, and that's a world I want to live in.
Thanks to Zed for showing that my primary objective is possible.
Can anyone figure out a way to use parameterized modules on gen_servers? In the following example, let's assume that test_child is a gen_server with one parameter. When I try to start it, all I get is:
42> {test_child, "hello"}:start_link().
** exception exit: undef
in function test_child:init/1
called as test_child:init([])
in call from gen_server:init_it/6
in call from proc_lib:init_p_do_apply/3
Ultimately, I'm trying to figure out a way to use multiple named instances of a gen_server. As far as I can tell, as soon as you start doing that, you can't use your pretty API anymore and have to throw messages at your instances with gen_server:call and gen_server:cast. If I could tell instances their names, this problem could be alleviated.
I just want to say two things:
archaelus explains it correctly. As he says the final way he shows is the recommended way of doing it and does what you expect.
never, NEVER, NEVER, NEVER use the form you were trying! It is a left over from the old days which never meant what you intended and is strongly deprecated now.
There are two parts to this answer. The first is that you probably don't want to use paramatized modules until you're quite proficient with Erlang. All they give you is a different way to pass arguments around.
-module(test_module, [Param1]).
some_method() -> Param1.
is equivalent to
-module(test_non_paramatized_module).
some_method(Param1) -> Param1.
The former doesn't buy you much at all, and very little existing Erlang code uses that style.
It's more usual to pass the name argument (assuming you're creating a number of similar gen_servers registered under different names) to the start_link function.
start_link(Name) -> gen_server:start_link({local, Name}, ?MODULE, [Name], []).
The second part to the answer is that gen_server is compatible with paramatized modules:
-module(some_module, [Param1, Param2]).
start_link() ->
PModule = ?MODULE:new(Param1, Param2),
gen_server:start_link(PModule, [], []).
Param1 and Param2 will then be available in all the gen_server callback functions.
As Zed mentions, as start_link belongs to a paramatized module, you would need to do the following in order to call it:
Instance = some_module:new(Param1, Param2),
Instance:start_link().
I find this to be a particularly ugly style - the code that calls some_module:new/n must know the number and order of module parameters. The code that calls some_module:new/n also cannot live in some_module itself. This in turn makes a hot upgrade more difficult if the number or order of the module parameters change. You would have to coordinate loading two modules instead of one (some_module and its interface/constructor module) even if you could find a way to upgrade running some_module code. On a minor note, this style makes it somewhat more difficult to grep the codebase for some_module:start_link uses.
The recommended way to pass parameters to gen_servers is explicitly via gen_server:start_link/3,4 function arguments and store them in the state value you return from the ?MODULE:init/1 callack.
-module(good_style).
-record(state, {param1, param2}).
start_link(Param1, Param2) ->
gen_server:start_link(?MODULE, [Param1, Param2], []).
init([Param1, Param2]) ->
{ok, #state{param1=Param1,param2=Param2}}.
Using this style means that you won't be caught by the various parts of OTP that don't yet fully support paramatized modules (a new and still experimental feature). Also, the state value can be changed while the gen_server instance is running, but module parameters cannot.
This style also supports hot upgrade via the code change mechanism. When the code_change/3 function is called, you can return a new state value. There is no corresponding way to return a new paramatized module instance to the gen_server code.
I think you shouldn't use this feature this way. Looks like you are going after a OO-like interface to your gen_servers. You are using locally-registered names for this purpose - this add a lot of shared state into your program, which is The Bad Thing. Only crucial and central servers should be registered with register BIF - let all the others be unnamed and managed by some kind of manager on top of them (which should probably be registered under some name).
-module(zed, [Name]).
-behavior(gen_server).
-export([start_link/0, init/1, handle_cast/2]).
-export([increment/0]).
increment() ->
gen_server:cast(Name, increment).
start_link() ->
gen_server:start_link({local, Name}, {?MODULE, Name}, [], []).
init([]) ->
{ok, 0}.
handle_cast(increment, Counter) ->
NewCounter = Counter + 1,
io:format("~p~n", [NewCounter]),
{noreply, NewCounter}.
This module is working fine for me:
Eshell V5.7.2 (abort with ^G)
1> S1 = zed:new(s1).
{zed,s1}
2> S1:start_link().
{ok,<0.36.0>}
3> S1:increment().
1
ok
4> S1:increment().
2
ok

Erlang: How should I test this?

I've got an application where I cast a message to a gen_server to start an operation, then I call the gen_server every second to gather intermediate results until the operation completes. In production, it usually takes a couple of minutes, but it's only limited by the input size and I'd like to test hour long operations, too.
I want to always make sure this operation still works by running a test as needed. Ideally, I'd like to run this test multiple times with different inputs, also.
I use eunit right now, but it doesn't seem to have a purpose-built way of exercising this scenario. Does commmon test provide for this? Is there an elegant way of testing this or should I just hack something up? In general, I'm having trouble wrapping my head around how to systematically test stateful, asynchronous operations in Erlang.
Yes, common test will do this.
This is a cut down version of the common test suite skeleton that our erlang emacs mode provides (you can use the normal erlang one or erlware one):
-module(junk).
%% Note: This directive should only be used in test suites.
-compile(export_all).
-include("test_server.hrl").
%%
%% set up for the suite...
%%
init_per_suite(Config) ->
Config.
end_per_suite(_Config) ->
ok.
%%
%% setup for each case in the suite - can know which test case it is in
init_per_testcase(_TestCase, Config) ->
Config.
end_per_testcase(_TestCase, _Config) ->
ok.
%%
%% allows the suite to be programmatically managed
%%
all(doc) ->
["Describe the main purpose of this suite"];
all(suite) ->
[].
%% Test cases starts here.
%%--------------------------------------------------------------------
test_case(doc) ->
["Describe the main purpose of test case"];
test_case(suite) ->
[];
test_case(Config) when is_list(Config) ->
ok.
There are 2 basic ways you could do it.
First up start the gen_server in init_per_suite/1 and then have a large number of atomic tests that act on that long running server and then tear the gen_server down in end_per_suite/1. This is the preferred way - your gen_server should be long-running and persistent over many transactions, blah-blah...
The other way is to make a singleton test and start the gen_server with init_per_testcase/2 and tear it down in end_per_testcase/2
Testing Stateful Asynchronous operations is hard in any language, Erlang or otherwise.
I would actuall recommend using etap and have the asynchronous tests run a callback that will then run etap:end_tests()
Since etap uses a running test server and it waits for the end_test call you have a little more control for asynchronous tests.

Resources