I have just learnt how to upgrade a module in Erlang and I know that only the function calls that use the fully qualified names (eg. module:function()) gets "relinked" to the current version loaded into the VM, but the function calls that do not specify the module's name do not get "relinked" to the current version, but keep using the older one.
Is there a rule of thumb on when to use a fully qualified function call and when it's OK to call a function just by its name? Is it a bad idea to call all functions using their full name (like module:function())?
Erlang applications normally make use of standard behaviors like gen_server and gen_fsm, which already contain fully qualified function calls within their internal loops and so take care of this issue.
But if for some reason you feel compelled to write your own module with its own recursive message-handling loop and you want that module to be upgradeable at runtime, the loop needs to contain a fully qualified recursive call, and normally you'd place this within a section of code handling a specific upgrade message, similar to the code_change/3 function expected in a callback module used with a standard behavior.
For example, consider the loop below, which is similar to those of the standard behaviors but greatly simplified:
loop(Callbacks, State) ->
{{Next, NState},DoChange} =
receive
{code_change, ChangeData} ->
{Callbacks:handle_code_change(ChangeData, State), true};
{cast,Data} ->
{Callbacks:handle_cast(Data,State), false};
{call,From,Data} ->
Result = Callbacks:handle_call(Data,State),
case Result of
{reply, Reply} ->
From ! Reply;
_ ->
ok
end,
{Reply, false};
Message ->
{Callbacks:handle_info(Message,State), false}
end,
case Next of
stop -> ok;
_ ->
case DoChange of
true -> ?MODULE:loop(Callbacks, NState);
false -> loop(Callbacks, NState)
end
end.
The loop/2 function takes two arguments: Callbacks, the name of a callback module expected to export specific functions invoked for specific messages, and State, which is opaque to the loop but presumably meaningful to the callback module. The loop is tail recursive and it handles several specific messages by calling specific callback functions, and then handles any other messages by calling handle_info/2 in the callback module. (If you've used the standard behaviors you'll find this approach familiar.) The callback functions return a Next value, and a new state to be passed to the next loop. If Next is stop, we exit the loop, otherwise we check the value of DoChange, which is set to true only for code change messages, and if it's true the loop calls itself with a fully qualified call, otherwise it uses just a regular call.
As mentioned earlier, this is all greatly simplified. It's extremely rare that you would need to write your own loops, and if you do there are other important things not shown here like system messages that you need to deal with. You are best off using the standard behaviors.
Related
I have a gen_server process that registers a global name like this:
global:register_name(<<"CLIENT_", NAME/binary>>, self()),
Another process is trying to send this process a message using gen_server:call like this:
gen_server:call({global, <<"CLIENT_", NAME/binary>>}, {msg, DATA}),
If the second call happens before the first process registers the global name, it dies with:
exit with reason {noproc,{gen_server,call,[{global,<<"CLIENT_122">>},{msg, <<"TEST">>}]}}
What is the correct way to make a call only if the global name is register, and do something else if it is not?
Three things:
How to guard this call (mechanics).
Why you generally shouldn't want to guard the call (robust architecture).
Where you are putting your interface to this function (code structure).
Mechanics
You can check whether a name is registered with the global registry before making the call like this:
-spec send_message(Name, Message) -> Result
when Name :: term(),
Message :: term(),
Result :: {ok, term()}
| {error, no_proc}.
send_message(Name, Message) ->
case global:whereis_name(Name) of
undefined ->
{error, no_proc};
PID ->
Value = gen_server:call(PID, Message),
{ok, Value}
end.
Because there will be a few nanoseconds between the return value of global:whereis_name/1 being checked and the actual call via gen_server:call/2,3, however, so you still don't know if you actually just sent a call to a dead process, but at least you sent it to a PID that won't crash the program right away.
Another way to do it would be with a try ... catch construct, but that is a very tricky habit to get into.
Robust Architecture
All that stuff above, keep it in the back of your mind, but in the front of your mind you should want to crash if this name is unregistered. Your registered process is supposed to be alive so why are you being so paranoid?!? If thing are bad you want to know they are bad in a catastrophic way and let everything related to that crash and burn straight away. Don't try to recover on your own in an unknown state, that is what supervisors are for. Let your system be restarted in a known state and give it another go. If this is a user-directed action (some user of the system, or a web page request or whatever) they will try again because they are monkeys that try things more than once. If it is an automated request (the user is a computer or robot, for example) it can retry again or not, but leave that decision up to it in the common case -- but give it some indication of failure (an error message, a closed socket, etc.).
As long as the process you are calling registers its name during its init/1 call (before it has returned its own PID to its supervisor), and this is always happening before the calling process is alive or aware of the process to be called then you shouldn't have any trouble with this. If it has crashed for some reason, then you have more fundamental problems with your program and catching the caller's crash isn't going to help you. This is a basic idea in robustness engineering.
Structure your system so that the callee is guarantee to be alive and registered before the call can occur, and if it has died you should want the caller to die also. (Yes, I'm beating a dead horse, but this is important.)
Code Structure
Most of the time you don't want to have a module that defines a process, let's say foo.erl that defines a process we will name {global, "foo"}, have a naked call to gen_server:call/2,3 or gen_server:cast/2 that is intended for a separate process defined in another module (let's say bar.erl that defines a process we will name {global, "bar"}). What we would want is that bar.erl has an interface function that it exports, and that this function is where the gen_server:call/2 but happens.
That way any special work that applies to this call (which any other calling module may also require) exists in a single spot, and you can name the interface to the process "bar" in a way that conveys some meaning aside from the message being passed to it.
For example, if the process defined by bar.erl is a connection counter (maybe we are writing a game server and we're counting connections) we might have bar.erl be in charge of maintaining the counter. So processes send a cast (asynch message) to bar any time a new user connects. Instead of having every different process that might need to do this define some complex name checking and then naked message sending, instead consider having a function exported from bar.erl that hides that mess and is named something meaningful, like bar:notify_connect(). Just calling this in your other code is much easier to understand, and you can choose how you should be dealing with this "what if bar doesn't exist?" situation right there, in one spot.
On that note, you might want to take a look at the basic Erlang "service manager -> worker" pattern. Named processes are not overwhelmingly needed in many cases.
I am having a hard time wrapping my head around the correct way to make calls against a gen_server instance dynamically created by a supervisor with a simple_one_for_one child strategy. I am attempting to create data access controls as gen_servers. Each entity will have its own supervisor, and that supervisor will create gen_server instances as needed to actually perform CRUD operations on the database. I understand the process for defining the child processes, as well as the process for creating them as needed.
Initially, my plan was to abstract the child creation process into custom functions in the gen_server module that created a child, fired off the requested operation (e.g. find, store, delete) on that child using gen_server:call(), and then returning the operation results back to the calling process. Unless I am mistaken, though, that will block any other processes attempting to use those functions until the call returns. That is definitely not what I have in mind.
I may be stuck in OO mode (my background is Java), but it seems like there should be a clean way of allowing a function in one module to obtain a reference to a child process and then make calls against that process without leaking the internals of that child. In other words, I do not want to have to call the create_child() method on an entity supervisor and then have my application code make gen_server:calls against that child PID (i.e. gen_sever:call(Pid, {find_by_id, Id})). I would instead like to be able to call a function more like Child:find_by_id(Id).
A full answer is highly dependent on your application — for example, one gen_server might suffice, or you might really need a pool of database connections instead. But one thing you should be aware of is that a gen_server can return from a handle_call callback before it actually has a reply ready for the client by returning {noreply, NewState} and then later, once it has a client reply ready, calling gen_server:reply/2 to send it back to the client. This allows the gen_server to service calls from other clients without blocking on the first call. Note though that this requires that the gen_server has a way of sending a request into the database without having to block waiting for a reply; this is often achieved by having the database send a reply that arrives in the gen_server:handle_info/2 callback, passing enough info back that the gen_server can associate the database reply with the correct client request. Note also that gen_server:call/2,3 has a default timeout of 5 seconds, so you'll need to deal with that if you expect the duration of database calls to exceed the default.
when you create, modify or delete a record, you don't need to wait for an answer. You can use a gen_server:cast for this, but you don't need a gen_server for this, as I said in my first comment, a simple call to an interface function executed in the client process will save time.
If you want to read, 2 cases:
you can do something else while waiting the answer, then a gen_server call is ok, but a simple spawned process waiting for the answer and sending it back to the client will provide the same service.
you cannot do anything before getting the answer, then there is no blocking issue, and I think that it is really preferable to use as less code as possible so again a simple function call will be enough.
gen_server is meant to be persistent and react to messages. I don't see in your example the need to be persistent.
-module(access).
-export([add/2,get/1]).
-record(foo, {bar, baz}).
add(A,B) ->
F = fun() ->
mnesia:write(#foo{bar=A,baz=B})
end,
spawn(mnesia,activity,[transaction, F]). %% the function return immediately,
%% but you will not know if the transaction failed
get(Bar) ->
F = fun() ->
case mnesia:read({foo, Bar}) of
[#foo{baz=Baz}] -> Baz;
[] -> undefined
end
end,
Pid = self(),
Ref = make_ref(),
Get = fun() ->
R = mnesia:activity(transaction, F),
Pid ! {Ref,baz,R}
end,
spawn(Get),
Ref. %% the function return immediately a ref, and will send later the message {Ref,baz,Baz}.
If the problem you see is that you are leaking that the internal implementation of your db-process is a gen_server, you could implement the api such that it takes the pid as argument as well.
-module(user).
-behaviour(gen_server).
-export([find_by_id/2]).
find_by_id(Pid, Id) ->
gen_server:call(Pid, {find_by_id, Id}).
%% Lots of code omitted
handle_call({find_by_id, Id}, From, State) ->
ok.
%% Lots more code omitted.
This way you don't tell clients that the implementation is in fact a gen_server (although someone could use gen_server:call as well).
I am building an app which can run in two modes. A sandbox mode and a production one.
In sandbox mode, i want to make many checks in my gen_server against the database : if table doesn't exist then create it ; if column doesn't exist then add it ; if column type doesn't allow the value i want to store then change it, etc.
In production mode, if a tables does not exist or a column does not match the type of the value, it will fail and that is ok.
So, in order to avoid cumbersome code like "case State#state.is_sandbox of true -> ... ",
i would like to have two different modules for my gen_server, and i would like to change the current module either in handle_call or handle_info.
Actually, i just want to go from sandbox to production, but i think if it works this way, it could work backwards.
Thanks.
You can add module, which is a name of a module, to the state in gen_server. Then you will need 2 modules - sandbox and production that both implement the same functions (you could create a behaviour for it).
The gen_server callbacks will call module:function which will be a function either from sandbox or production module. The module can be set in init function of the gen_server, to change it, simply add a new function(s) to the gen_server:
use_production() ->
gen_server:cast(production).
....
handle_cast(production, State) ->
{noreply, State#state{module = production}).
The same for the sandbox module.
An example of a gen_server's callback with the module:
handle_call(Msg, _From, #state{module = Module} = State) ->
Module:function(Msq),
{reply, ok, State}.
The function must be implemented in both sandbox and production modules.
You can get module name using os:getenv/1 (of course you have to set different names in different environments before that)
You could use a gen_event with a single handler instead, which allows you to return a swap_handler tuple (see gen_event/handle_*)
Also, you don't have to use case statements in the gen_server model. If your state contains the sandbox variable, you can define different clauses for your callback functions by binding the sandbox value in the header. For instance:
handle_call(do_stuff, _From, State = #state{sandbox = true}) ->
do_sandbox_stuff();
handle_call(do_stuff, _From, State) ->
do_nonsandbox_stuff().
In this setup erlang automatically chooses the correct clause to fire based on the value of the sandbox variable, without you having to define a separate handler or use a case statement. Binding variables in function clauses this way is also good practice for efficiency (since the variables are bound outside of the body of the function, the binding process is done in the scheduler and, as a result, does not count against the function's execution time, whereas all matching is done inside the function body in a case)
Instead of a gen_server you could use gen_fsm, a finite state machine, which handles this case very easily. You just have multiple states which call functions in different modules depending on the state. It basically does all the handling for you, without the need to carry an explicit state parameter. Which is basically implementing an FSM by hand.
I'm writing an event manager that will take a lot of different event handlers. This event manager will be notified with a lot of different events. Each handler only handle certain events, and ignore the rest. Each handler can also trigger certain other events based on situation.
For example, first handler to handle Event1
-module (first_handler).
-behavior (gen_event).
...
handle_event(Event1, State) -> {ok, State};
handle_event(_, State) -> {ok, State}.
Second handler to handle Event2
-module (second_handler).
-behavior (gen_event).
...
handle_event(Event2, State) ->
gen_event:notify(self(), Event1),
{ok, State};
handle_event(_, State) -> {ok, State}.
The event triggering can be done by calling gen_event:notify(self(), NewEvent) within a handle_event of the handler, but I would rather abstract and export that out so that it can be called from the event manager.
Since pattern matching and ignoring events and triggering events are common to all the handlers, is there anyway I can extend gen_event behavior to provide those as built-ins?
I'll start with the default way to create a custom behavior:
-module (gen_new_event).
-behaviour (gen_event).
behaviour_info(Type) -> gen_event:behaviour_info(Type).
I'm not sure what to do next.
What are you trying to do exactly? I could not understand from the examples you provided. In second_handler's handle_event/2, Event1 is unbound. Also, does using self() work? Shouldn't that be the registered name of the manager. Not sure whether handle_event/2 gets executed by the manager or each handler process (but the latter makes more sense).
By implementing your gen_new_event module, you are implementing a handler (i.e. a callback module), and not an event manager. The fact that you have -behaviour(gen_event) means that you're asking the compiler to check that gen_new_event actually implements all the functions listed by gen_event:behaviour_info(callbacks), thereby making gen_new_event an eligible handler which you could add to an event manager via gen_event:add_handler(manager_registered_name, gen_new_event, []).
Now, if you take away -behaviour (gen_event), gen_new_event no longer has to implement the following functions:
35> gen_event:behaviour_info(callbacks).
[{init,1},
{handle_event,2},
{handle_call,2},
{handle_info,2},
{terminate,2},
{code_change,3}]
You could make gen_new_event a behaviour (i.e. an interface) by adding more functions which you will be requiring any module which uses -behaviour(gen_new_event) to implement:
-module (gen_new_event).
-export([behaviour_info/1]).
behaviour_info(callbacks) ->
[{some_fun, 2}, {some_other_fun, 3} | gen_event:behaviour_info(callbacks)].
Now, if in some module, for e.g. -module(example), you add the attribute -behaviour(gen_new_event), then the module example will have to implement all the gen_event callback functions + some_fun/2 and some_other_fun/3.
I doubt that's what you were looking for, but your last example seemed to suggest that you wanted to implement a behaviour. Note that, all you're doing by implementing a behaviour is requiring other modules to implement certain functions should they use -behaviour(your_behaviour).
(Also, if I understood you correctly, if you want to extend gen_event then you could always simply copy the code in gen_event.erl and extend it ... I guess, but is this really necessary for what you're trying to do?).
Edit
Objective: extract common code out of gen_event implementations. So for e.g. there's a handle_event/2 clause which you want in every one of your gen_events.
One way of going about it: You could use a parameterized module. This module would implement the gen_event behaviour, but, only the common behaviour which all your gen_event callback modules should have. Anything which is not "common" can be delegated to the module's parameter (which you'd bind to a module name containing the "custom" implementation of the gen_event callback.
E.g.
-module(abstract_gen_event, [SpecificGenEvent]).
-behaviour(gen_event).
-export(... all gen_event functions).
....
handle_event({info, Info}, State) ->
%% Do something which you want all your gen_events to do.
handle_event(Event, State) ->
%% Ok, now let the particular gen_event take over:
SpecificGenEvent:handle_event(Event, State).
%% Same sort of thing for other callback functions
....
Then you'd implement one or more gen_event modules which you'll be plugging into abstract_gen_event. Lets say one of them is a_gen_event.
Then you should be able to do:
AGenEvent = abstract_gen_event:new(a_gen_event). %% Note: the function new/x is auto-generated and will have arity according to how many parameters a parameterized module has.
Then, I guess you could pass AGenEvent to gen_event:add_handler(some_ref, AGenEvent, []) and it should work but note that I have never tried this out.
Perhaps you could also get around this using macros or (but this is a bit overkill) do some playing around at compilation time using parse_transform/2. Just a thought though. See how this parameterized solution goes first.
2nd Edit
(Note: not sure whether I should delete everything prior to what is in this section. Please let me know or just delete it if you know what you're doing).
Ok, so I tried it out myself and yes, the return value of a parameterized module will crash when feeding it to gen_event:add_handler/3's second argument... too bad :(
I can't think of any other way of going about this then other than a) using macros b) using parse_transform/2.
a)
-module(ge).
-behaviour(gen_event).
-define(handle_event,
handle_event({info, Info}, State) ->
io:format("Info: ~p~n", [Info]),
{ok, State}).
?handle_event;
handle_event(Event, State) ->
io:format("got event: ~p~n", [Event]),
{ok, State}.
So basically you would have all the callback function clauses for the common functionality defined in macro definitions in a header file which you include in every gen_event which uses this common functionality. Then you ?X before/after each callback function which uses the common functionality... I know it's not that clean and I'm generally weary of using macros myself but hey... if the problem is really nagging you that's one way to go about it.
b) Google around for some info on using parse_transform/2 in Erlang. You could implement a parse_transform which looks for the callback functions in you gen_event modules which have the specific cases for the callbacks but do not have the generic cases (i.e. clauses like the ({info, Info}, State) in the macro above). Then you would simply add the forms which make up the generic cases.
I would suggest doing something like this (add exports):
-module(tmp).
parse_transform(Forms, Options) ->
io:format("~p~n", [Forms]),
Forms.
-module(generic).
gen(Event, State) ->
io:format("Event is: ~p~n", [Event]),
{ok, State}.
Now you can compile with:
c(tmp).
c(generic, {parse_transform, tmp}).
[{attribute,1,file,{"../src/generic.erl",1}},
{attribute,4,module,generic},
{attribute,14,compile,export_all},
{function,19,gen,2,
[{clause,19,
[{var,19,'Event'},{var,19,'State'}],
[],
[{call,20,
{remote,20,{atom,20,io},{atom,20,format}},
[{string,20,"Event is: ~p~n"},
{cons,20,{var,20,'Event'},{nil,20}}]},
{tuple,21,[{atom,21,ok},{var,21,'State'}]}]}]},
{eof,28}]
{ok,generic}
That way you can copy-paste the forms you'll be injecting. You would copy them into a proper parse_transform/2 which, rather than just printing, would actually go through your source's code and inject the code you want where you want it.
As a side note, you could include the attribute -compile({parse_transform, tmp}) to every gen_event module of yours which needs to be parse_transformed in this way to add the generic functionality (i.e. and avoid having to pass this to the compiler yourself). Just make sure tmp or whichever module contains your parse_transform is loaded or compiled in a dir on the path.
b) seems like a lot of work I know...
Your installed handlers are already running in the context of the event manager which you start and then install handlers into. So if their handle-event function throws out data, they already do what you want.
You don't need to extend the event behaviour. What you do is:
handle_event(Event, State) ->
generic:handle_event(Event, State).
and then let the generic module handle the generic parts. Note that you could supply generic a way to callback to this handler module for specialized handler behaviour should you need it. For example:
generic:handle_event(fun ?MODULE:callback/2, Event, State)...
and so on.
One of the things that attracted me to Erlang in the first place is the Actor model; the idea that different processes run concurrently and interact via asynchronous messaging.
I'm just starting to get my teeth into OTP and in particular looking at gen_server. All the examples I've seen - and granted they are tutorial type examples - use handle_call() rather than handle_cast() to implement module behaviour.
I find that a little confusing. As far as I can tell, handle_call is a synchronous operation: the caller is blocked until the callee completes and returns. Which seems to run counter to the async message passing philosophy.
I'm about to start a new OTP application. This seems like a fundamental architectural decision so I want to be sure I understand before embarking.
My questions are:
In real practice do people tend to use handle_call rather than handle_cast?
If so, what's the scalability impact when multiple clients can call the same process/module?
Depends on your situation.
If you want to get a result, handle_call is really common. If you're not interested in the result of the call, use handle_cast. When handle_call is used, the caller will block, yes. This is most of time okay. Let's take a look at an example.
If you have a web server, that returns contents of files to clients, you'll be able to handle multiple clients. Each client have to wait for the contents of files to be read, so using handle_call in such a scenario would be perfectly fine (stupid example aside).
When you really need the behavior of sending a request, doing some other processing and then getting the reply later, typically two calls are used (for example, one cast and the one call to get the result) or normal message passing. But this is a fairly rare case.
Using handle_call will block the process for the duration of the call. This will lead to clients queuing up to get their replies and thus the whole thing will run in sequence.
If you want parallel code, you have to write parallel code. The only way to do that is to run multiple processes.
So, to summarize:
Using handle_call will block the caller and occupy the process called for the duration of the call.
If you want parallel activities to go on, you have to parallelize. The only way to do that is by starting more processes, and suddenly call vs cast is not such a big issue any more (in fact, it's more comfortable with call).
Adam's answer is great, but I have one point to add
Using handle_call will block the process for the duration of the call.
This is always true for the client who made the handle_call call. This took me a while to wrap my head around but this doesn't necessarily mean the gen_server also has to block when answering the handle_call.
In my case, I encountered this when I created a database handling gen_server and deliberately wrote a query that executed SELECT pg_sleep(10), which is PostgreSQL-speak for "sleep for 10 seconds", and was my way of testing for very expensive queries. My challenge: I don't want the database gen_server to sit there waiting for the database to finish!
My solution was to use gen_server:reply/2:
This function can be used by a gen_server to explicitly send a reply to a client that called call/2,3 or multi_call/2,3,4, when the reply cannot be defined in the return value of Module:handle_call/3.
In code:
-module(database_server).
-behaviour(gen_server).
-define(DB_TIMEOUT, 30000).
<snip>
get_very_expensive_document(DocumentId) ->
gen_server:call(?MODULE, {get_very_expensive_document, DocumentId}, ?DB_TIMEOUT).
<snip>
handle_call({get_very_expensive_document, DocumentId}, From, State) ->
%% Spawn a new process to perform the query. Give it From,
%% which is the PID of the caller.
proc_lib:spawn_link(?MODULE, query_get_very_expensive_document, [From, DocumentId]),
%% This gen_server process couldn't care less about the query
%% any more! It's up to the spawned process now.
{noreply, State};
<snip>
query_get_very_expensive_document(From, DocumentId) ->
%% Reference: http://www.erlang.org/doc/man/proc_lib.html#init_ack-1
proc_lib:init_ack(ok),
Result = query(pgsql_pool, "SELECT pg_sleep(10);", []),
gen_server:reply(From, {return_query, ok, Result}).
IMO, in concurrent world handle_call is generally a bad idea. Say we have process A (gen_server) receiving some event (user pressed a button), and then casting message to process B (gen_server) requesting heavy processing of this pressed button. Process B can spawn sub-process C, which in turn cast message back to A when ready (of to B which cast message to A then). During processing time both A and B are ready to accept new requests. When A receives cast message from C (or B) it e.g. displays result to the user. Of course, it is possible that second button will be processed before first, so A should probably accumulate results in proper order. Blocking A and B through handle_call will make this system single-threaded (though will solve ordering problem)
In fact, spawning C is similar to handle_call, the difference is that C is highly specialized, process just "one message" and exits after that. B is supposed to have other functionality (e.g. limit number of workers, control timeouts), otherwise C could be spawned from A.
Edit: C is asynchronous also, so spawning C it is not similar to handle_call (B is not blocked).
There are two ways to go with this. One is to change to using an event management approach. The one I am using is to use cast as shown...
submit(ResourceId,Query) ->
%%
%% non blocking query submission
%%
Ref = make_ref(),
From = {self(),Ref},
gen_server:cast(ResourceId,{submit,From,Query}),
{ok,Ref}.
And the cast/submit code is...
handle_cast({submit,{Pid,Ref},Query},State) ->
Result = process_query(Query,State),
gen_server:cast(Pid,{query_result,Ref,Result});
The reference is used to track the query asynchronously.