How to describe gen_server visually? - erlang

Disclaimer: The author is a newbie in OTP having some basic knowledge of Erlang's syntax, processes and messages.
I am trying to grasp the notion of behaviours in Erlang, but a lot of questions spring in my head preventing me from understanding the whole principle of such a behaviour like gen_server.
Okay, the official documentation for gen_server shows a nice diagram of a server and three clients connected with Query and Reply arrows:
http://www.erlang.org/doc/design_principles/gen_server_concepts.html
But each time I try to understand the concept further, I get stuck.
There is a lot of concepts which I cannot build into one larger concept in my head:
behaviour implementation;
behaviour container;
behaviour interface;
callback module;
callback functions;
API functions.
I use the following resources:
Erlang/OTP in Action book;
Introduction to OTP behaviours presentation, http://www.slideshare.net/gamlidek/ceug-introduction-to-otp-behaviors-part-i-genserver;
'ErlyBank' at http://spawnlink.com/articles/an-introduction-to-gen_server-erlybank/index.html.
I am still in the state "we call one function in one module, this function calls the other function, that function creates a process... stuck"
Is there any way to describe the notion of gen_server in a diagram? How can an interaction flow between clients and a server be shown visually? (to help a not so smart newcomer to understand the concept visually)
For example like here: http://support.novell.com/techcenter/articles/img/dnd2003080506.gif
UPD: I have tried to draw a diagram of my own, but I still don't get the purpose of any connector in the diagram: http://postimage.org/image/qe215ric/full/
UPD2: This is something similar to what I would like to see: http://cryptoanarchy.org/wiki/Worker_patterns (The Model). However, it doesn't show the interaction between modules, functions and processes.

I don't have a precise drawing to explain it, but I have this chapter and the one after showing how to build gen_server starting with the abstraction principles behind it.
To help with the individual components:
behaviour implementation
The behaviour itself is a bit like what is shown in the chapter I linked before. It's a module with a bunch of functions doing all the generic stuff for you: receiving messages, defining functions and hidden protocols to communicate, etc. Advanced OTP stuff contains special kinds of messages used to do software upgrades and also special code for tracing options.
behaviour container
I'm not sure what this is supposed to be. Maybe just the module with the name of the behaviour?
behaviour interface
In the same module your behaviour implementation is, you have to define a behaviour_info/1 function. That function will let the Erlang compiler know that some callbacks are expected from any module that has -behaviour(SomeModuleName) in it. The SomeModuleName is equivalent to a SomeModuleName.erl (and .beam) file that contains the implementation and the behaviour_info function.
callback module
The module that will contain all the specific code, handling all the custom stuff.
callback functions
Everything that isn't generic gets to be delegated to the callback module in the form of YourModule:SomeCall(Args). These are provided by your module, the one that has the -behaviour(gen_server). line in it.
API functions
The callback module has two interfaces, if you want: the one for the gen_server behaviour (init/0, handle_call/3, handle_info/2, handle_cast/2, terminate/2, code_change/3), and the one for the user (start the server, send some information, ask for some information back).
I could try to describe it that way
---------------------------------------------------------------------
| some process | server process |
------------------------+--------------------------------------------
[client] | [callback] : [behaviour]
| :
callback:start >-------|---------------------:--> starting the process
| : V
| : |
| init() <-----:-----------`
| | :
| `-----------:------> initial state
{ok, Pid} <----------|---------------------:----------,/
| :
callback:store >------|---------------------:--> handles message
(calls the process) | (formats msg) : V
| : |
| handle_call() <--:-----------`
| | :
| `----------:--> updates state, sends reply
| : V
| : |
gets result <--------|---------------------:--------`
| :
All the generic parts are on the right of the server process, within the behaviour, and all the specific parts are on the left (callback). The client uses the callback module's API/interface to contact the server process and have effects on it.
You have to see the behaviour as some kind of very generic code segment that sometimes gives up its execution flow (for more precise parts, like receiving and sending messages) to the specific code (how to react to these messages).
Hopefully this helps.

Related

Erlang. Question about the difference of ?SERVER and ? MODULE macros

In all samples of gen_server implementations I've saw the ?SERVER is assigned to ?MODULE.
Look down here:
-define(SERVER, ?MODULE).
...
gen_server:start_link({local, ?SERVER}, ?MODULE, [], [])
The idea, I have clued is to run many server processes with different names but implemented in one module.
But, when I tried to run server with the name different from module name in my experiments, I always got errors.
Can, please, somebody explain me this subtlety.
The code you show does not and cannot implement multiple servers with different names, since the server name is defined as the same as the module name. So if you try with this code to get multiple servers implemented in one module your attempts will fail.
The reason for introducing separate SERVER macro with the same value as MODULE is to make things more explicit. In start_link call the two macros may have the same value, but they serve different purposes, so it is clearer to use two instead of one.

Why do gen_server examples always define SERVER macro?

Why do some examples (and templates in text editor) of gen_server have:
-define(SERVER, ?MODULE).
Is there any good reason for it?
This question brought about by Inaka's guildelines, where they state the opposite:
Don't use macros for module or function names
Here is the code example they provide:
-module(macro_mod_names).
-define(SERVER, ?MODULE). % Oh, god! Why??
-define(TM, another_module).
-export([bad/1, good/1]).
bad(Arg) ->
Parsed = gen_server:call(?SERVER, {parse, Arg}),
?TM:handle(Parsed).
good(Arg) ->
Parsed = gen_server:call(?MODULE, {parse, Arg}),
another_module:handle(Parsed).
Why does every example (and templates in text editor) of gen_server always have
Searching for "erlang gen_server example", no hits on the first page for me define this macro (and in fact I haven't seen it before). In particular, this includes Erlang documentation's own http://erlang.org/doc/design_principles/gen_server_concepts.html, "Learn you some Erlang", and the Erlang wikibook.
Is there any good reason for it?
The reason is clearly to use a more "descriptive" name; whether this is a good reason is a question of taste.
I think it is a good practice to use -define to define and document relevant variables for the module. This is especially true for variables that get used at different places in the module and you want to make it configurable.
Actually, I think your question tackles this at the wrong side: the gen_server name is a module-wide configurable variable (and hence it is best practice to define it), and for the sake of simplicity it became common practice to choose the server name equal to the module name: gen_servers name is normally registered so you can send messages to it. Since the name is a critical variable here (and there might even be cases when you would like to change it), it is normally -defineded.
I also think that the guidelines you quotes are speaking about a different use-case for macros.

Completely confused about MapReduce in Riak + Erlang's riakc client

The main thing I'm confused about here (I think) is what the arguments to the qfun are supposed to be and what the return value should be. The README basically doesn't say anything about this and the example it gives throws away the second and third args.
Right now I'm only trying to understand the arguments and not using Riak for anything practical. Eventually I'll be trying to rebuild our (slow, MySQL-based) financial reporting system with it. So ignoring the pointlessness of my goal here, why does the following give me a badfun exception?
The data is just tuples (pairs) of Names and Ages, with the keys being the name. I'm not doing any conversion to JSON or such before inserting the data from the Erlang console.
Now with some {Name, Age} pairs stored in <<"people">> I want to use MapReduce (for no other reason than to understand "how") to get the values back out, unchanged in this first use.
riakc_pb_socket:mapred(
Pid, <<"people">>,
[{map, {qfun, fun(Obj, _, _) -> [Obj] end}, none, true}]).
This just gives me a badfun, however:
{error,<<"{\"phase\":0,\"error\":\"{badfun,#Fun<erl_eval.18.17052888>}\",\"input\":\"{ok,{r_object,<<\\\"people\\\">>,<<\\\"elaine\\\">"...>>}
How do I just pass the data through my map function unchanged? Is there any better documentation of the Erlang client than what is in the README? That README seems to assume you already know what the inputs are.
There are 2 Riak Erlang clients that serve different purposes.
The first one is the internal Riak client that is included in the riak_kv module (riak_client.erl and riak_object.erl). This can be used if you are attached to the Riak console or if you are writing a MapReduce function or a commit hook. As it is run from within a Riak node it works quite well with qfuns.
The other client is the official Riak client for Erlang that is used by external applications and connects to Riak through the protocol buffers interface. This is what you are using in your example above. As this connects through protocol buffers, it is usually recommended that MapReduce functions in Erlang are compiled and deployed on the nodes of the cluster as named functions. This will also make them accessible from other client libraries.
I think my code is actually correct and my problem lies in the fact I'm trying to use the shell to execute the code. I need to actually compile the code before it can be run in Riak. This is a limitation of the Erlang shell and the way it compiles funs.
After a few days of playing around, here's a neat trick that makes development easier. Exploit Erlang's RPC support and the fact it has runtime code loading, to distribute your code across all the Riak nodes:
%% Call this somewhere during your app's initialization routine.
%% Assumes you have a list of available Riak nodes in your app's env.
load_mapreduce_in_riak() ->
load_mapreduce_in_riak(application:get_env(app_name, riak_nodes, [])).
load_mapreduce_in_riak([]) ->
ok;
load_mapreduce_in_riak([{Node, Cookie}|Tail]) ->
erlang:set_cookie(Node, Cookie),
case net_adm:ping(Node) of
pong ->
{Mod, Bin, Path} = code:get_object_code(app_name_mapreduce),
rpc:call(Node, code, load_binary, [Mod, Path, Bin]);
pang ->
io:format("Riak node ~p down! (ping <-> pang)~n", [Node])
end,
load_mapreduce_in_riak(Tail).
Now you can refer to any of the functions in the module app_name_mapreduce and they'll be visible to the Riak cluster. The code can be removed again with code:delete/1, if needed.

Is it bad to send a message to self() in init?

In this example, the author avoids a deadlock situation by doing:
self() ! {start_worker_supervisor, Sup, MFA}
in his gen_server's init function. I did something similar in one of my projects and was told this method was frowned upon, and that it was better to cause an immediate timeout instead. What is the accepted pattern?
Update for Erlang 19+
Consider using the new gen_statem behaviour. This behaviour supports generating of events internal to the FSM:
The state function can insert events using the action() next_event and such an event is inserted as the next to present to the state function. That is, as if it is the oldest incoming event. A dedicated event_type() internal can be used for such events making them impossible to mistake for external events.
Inserting an event replaces the trick of calling your own state handling functions that you often would have to resort to in, for example, gen_fsm to force processing an inserted event before others.
Using the action functionality in that module, you can ensure your event is generated in init and always handled before any external events, specifically by creating a next_event action in your init function.
Example:
...
callback_mode() -> state_functions.
init(_Args) ->
{ok, my_state, #data{}, [{next_event, internal, do_the_thing}]}
my_state(internal, do_the_thing, Data) ->
the_thing(),
{keep_state, Data);
my_state({call, From}, Call, Data) ->
...
...
Old answer
When designing a gen_server you generally have the choice to perform actions in three different states:
When starting up, in init/1
When running, in any handle_* function
When stopping, in terminate/2
A good rule of thumb is to execute things in the handling functions when acting upon an event (call, cast, message etc). The stuff that gets executed in init should not wait for events, that's what the handle callbacks are for.
So, in this particular case, a kind of "fake" event is generated. I'd say it seems that the gen_server always wants to initiate the starting of the supervisor. Why not just do it directly in init/1? Is there really a requirement to be able to handle another message in-between (the effect of doing it in handle_info/2 instead)? That windown is so incredibly small (the time between start of the gen_server and the sending of the message to self()) so it's highly unlikely to happen at all.
As for the deadlock, I would really advise against calling your own supervisor in your init function. That's just bad practice. A good design pattern for starting worker process would be one top level supervisor, with a manager and a worker supervisor beneath. The manager starts workers by calling the worker supervisor:
[top_sup]
| \
| \
| \
man [work_sup]
/ | \
/ | \
/ | \
w1 ... wN
Just to complement what has already been said about splitting a servers initialisation into two parts, the first in the init/1 function and the second in either handle_cast/2 or handle_info/2. There is really only one reason to do this and that is if the initialisation is expected to take a long time. Then splitting it up will allow the gen_server:start_link to return faster which can be important for servers started by supervisors as they "hang" while starting their children and one slow starting child can delay the whole supervisor startup.
In this case I don't think it is bad style to split the server initialisation.
It is important to be careful with errors. An error in init/1 will cause the supervisor to terminate while an error in the second part as they will cause the supervisor to try and restart that child.
I personally think it is better style for the server to send a message to itself, either with an explicit ! or a gen_server:cast, as with a good descriptive message, for example init_phase_2, it will be easier to see what is going on, rather than a more anonymous timeout. Especially if timeouts are used elsewhere as well.
Calling your own supervisor sure does seem like a bad idea, but I do something similar all the time.
init(...) ->
gen_server:cast(self(), startup),
{ok, ...}.
handle_cast(startup, State) ->
slow_initialisation_task_reading_from_disk_fetching_data_from_network_etc(),
{noreply, State}.
I think this is clearer than using timeout and handle_info, it's pretty much guaranteed that no message can get ahead of the startup message (no one else has our pid until after we've sent that message), and it doesn't get in the way if I need to use timeouts for something else.
This may be very efficient and simple solution, but I think it is not good erlang style.
I am using timer:apply_after, which is better and does not make impression of interacting with external module/gen_*.
I think that the best way would be to use state machines (gen_fsm). Most of our gen_srvers are really state machine, however because initial work effort to set up get_fsm I think we end up with gen_srv.
To conclude, I would use timer:apply_after to make code clear and efficient or gen_fsm to be pure Erlang style (even faster).
I have just read code snippets, but example itself is somehow broken -- I do not understand this construct of gen_srv manipulating supervisor. Even if it is manager of some pool of future children, this is even more important reason to do it explicitly, without counting on processes' mailbox magic. Debugging this would be also hell in some bigger system.
Frankly, I don't see a point in splitting initialization. Doing heavy lifting in init does hang supervisor, but using timeout/handle_info, sending message to self() or adding init_check to every handler (another possibility, not very convenient though) will effectively hang calling processes. So why do I need "working" supervisor with "not quite working" gen_server? Clean implementation should probably include "not_ready" reply for any message during initialization (why not to spawn full initialization from init + send message back to self() when complete, which would reset "not_ready" status), but then "not ready" reply should be properly processed by the caller and this adds a lot of complexity. Just suspending a reply is not a good idea.

How do you parameterize a gen_server module?

EDIT:
I'm not looking to use parameters as a general purpose way to construct Erlang programs--I'm still learning the traditional design principles. I'm also not looking to emulate OOP. My only point here is to make my gen_server calls consistent across server instances. This seems more like fixing a broken abstraction to me. I can imagine a world where the language or OTP made it convenient to use any gen_server instance's api, and that's a world I want to live in.
Thanks to Zed for showing that my primary objective is possible.
Can anyone figure out a way to use parameterized modules on gen_servers? In the following example, let's assume that test_child is a gen_server with one parameter. When I try to start it, all I get is:
42> {test_child, "hello"}:start_link().
** exception exit: undef
in function test_child:init/1
called as test_child:init([])
in call from gen_server:init_it/6
in call from proc_lib:init_p_do_apply/3
Ultimately, I'm trying to figure out a way to use multiple named instances of a gen_server. As far as I can tell, as soon as you start doing that, you can't use your pretty API anymore and have to throw messages at your instances with gen_server:call and gen_server:cast. If I could tell instances their names, this problem could be alleviated.
I just want to say two things:
archaelus explains it correctly. As he says the final way he shows is the recommended way of doing it and does what you expect.
never, NEVER, NEVER, NEVER use the form you were trying! It is a left over from the old days which never meant what you intended and is strongly deprecated now.
There are two parts to this answer. The first is that you probably don't want to use paramatized modules until you're quite proficient with Erlang. All they give you is a different way to pass arguments around.
-module(test_module, [Param1]).
some_method() -> Param1.
is equivalent to
-module(test_non_paramatized_module).
some_method(Param1) -> Param1.
The former doesn't buy you much at all, and very little existing Erlang code uses that style.
It's more usual to pass the name argument (assuming you're creating a number of similar gen_servers registered under different names) to the start_link function.
start_link(Name) -> gen_server:start_link({local, Name}, ?MODULE, [Name], []).
The second part to the answer is that gen_server is compatible with paramatized modules:
-module(some_module, [Param1, Param2]).
start_link() ->
PModule = ?MODULE:new(Param1, Param2),
gen_server:start_link(PModule, [], []).
Param1 and Param2 will then be available in all the gen_server callback functions.
As Zed mentions, as start_link belongs to a paramatized module, you would need to do the following in order to call it:
Instance = some_module:new(Param1, Param2),
Instance:start_link().
I find this to be a particularly ugly style - the code that calls some_module:new/n must know the number and order of module parameters. The code that calls some_module:new/n also cannot live in some_module itself. This in turn makes a hot upgrade more difficult if the number or order of the module parameters change. You would have to coordinate loading two modules instead of one (some_module and its interface/constructor module) even if you could find a way to upgrade running some_module code. On a minor note, this style makes it somewhat more difficult to grep the codebase for some_module:start_link uses.
The recommended way to pass parameters to gen_servers is explicitly via gen_server:start_link/3,4 function arguments and store them in the state value you return from the ?MODULE:init/1 callack.
-module(good_style).
-record(state, {param1, param2}).
start_link(Param1, Param2) ->
gen_server:start_link(?MODULE, [Param1, Param2], []).
init([Param1, Param2]) ->
{ok, #state{param1=Param1,param2=Param2}}.
Using this style means that you won't be caught by the various parts of OTP that don't yet fully support paramatized modules (a new and still experimental feature). Also, the state value can be changed while the gen_server instance is running, but module parameters cannot.
This style also supports hot upgrade via the code change mechanism. When the code_change/3 function is called, you can return a new state value. There is no corresponding way to return a new paramatized module instance to the gen_server code.
I think you shouldn't use this feature this way. Looks like you are going after a OO-like interface to your gen_servers. You are using locally-registered names for this purpose - this add a lot of shared state into your program, which is The Bad Thing. Only crucial and central servers should be registered with register BIF - let all the others be unnamed and managed by some kind of manager on top of them (which should probably be registered under some name).
-module(zed, [Name]).
-behavior(gen_server).
-export([start_link/0, init/1, handle_cast/2]).
-export([increment/0]).
increment() ->
gen_server:cast(Name, increment).
start_link() ->
gen_server:start_link({local, Name}, {?MODULE, Name}, [], []).
init([]) ->
{ok, 0}.
handle_cast(increment, Counter) ->
NewCounter = Counter + 1,
io:format("~p~n", [NewCounter]),
{noreply, NewCounter}.
This module is working fine for me:
Eshell V5.7.2 (abort with ^G)
1> S1 = zed:new(s1).
{zed,s1}
2> S1:start_link().
{ok,<0.36.0>}
3> S1:increment().
1
ok
4> S1:increment().
2
ok

Resources