Name specification to use when starting a gen_server in erlang by {via,RegMod,ViaName} - erlang

In erlang gen_server, it says
{via,RegMod,ViaName}
Register the gen_server process with the registry represented by RegMod.
The RegMod callback is to export the functions register_name/2, unregister_name/1,
whereis_name/1, and send/2, which are to behave like the corresponding functions in global.
Thus, {via,global,GlobalName} is a valid reference equivalent to {global,GlobalName}.
But I cannot find any example about it, can anyone show an example about {via,RegMod,ViaName}?

Related

Can an OTP supervisor monitor a process on a remote node?

I'd like to use erlang's OTP supervisor in a distributed application I'm building. But I'm having trouble figuring out how a supervisor of this kind can monitor a process running on a remote Node. Unlike erlang's start_link function, start_child has no parameters for specifying the Node on which the child will be spawned.
Is it possible for a OTP supervisor to monitor a remote child, and if not, how can I achieve this in erlang?
supervisor:start_child/2 can be used across nodes.
The reason for your confusion is just a mix-up regarding the context of execution (which is admittedly a bit hard to keep straight sometimes). There are three processes involved in any OTP spawn:
The Requestor
The Supervisor
The Spawned Process
The context of the Requestor is the one in which supervisor:start_child/2 is called -- not the context of the supervisor itself. You would normally provide a supervisor interface by exporting a function that wraps the call to supervisor:spawn_child/2:
do_some_crashable_work(Data) ->
supervisor:start_child(sooper_dooper_sup, [Data]).
That might be defined and exported from the supervisor module, be defined internally within a "manager" sort of process according to the "service manager/supervisor/workers" idiom, or whatever. In all cases, though, some process other than the supervisor is making this call.
Now look carefully at the Erlang docs for supervisor:start_child/2 again (here, and an R19.1 doc mirror since sometimes erlang.org has a hard time for some reason). Note that the type sup_ref() can be a registered name, a pid(), a {global, Name} or a {Name, Node}. The requestor may be on any node calling a supervisor on any other node when calling with a pid(), {global, Name} or a {Name, Node} tuple.
The supervisor doesn't just randomly kick things off, though. It has a child_spec() it is going off of, and the spec tells the supervisor what to call to start that new process. This first call into the child module is made in the context of the supervisor and is a custom function. Though we typically name it something like start_link/N, it can do whatever we want as a part of startup, including declare a specific node on which to spawn. So now we wind up with something like this:
%% Usually defined in the requestor or supervisor module
do_some_crashable_work(SupNode, WorkerNode, Data) ->
supervisor:start_child({sooper_dooper_sup, SupNode}, [WorkerNode, Data]).
With a child spec of something like:
%% Usually in the supervisor code
SooperWorker = {sooper_worker,
{sooper_worker, start_link, []},
temporary,
brutal_kill,
worker,
[sooper_worker]},
Which indicates that the first call would be to sooper_worker:start_link/2:
%% The exported start_link function in the worker module
%% Called in the context of the supervisor
start_link(Node, Data) ->
Pid = proc_lib:spawn_link(Node, ?MODULE, init, [self(), Data]).
%% The first thing the newly spawned process will execute
%% in its own context, assuming here it is going to be a gen_server.
init(Parent, Data) ->
Debug = sys:debug_options([]),
{ok, State} = initialize_some_state(Data)
gen_server:enter_loop(Parent, Debug, State).
You might be wondering what all that mucking about with proc_lib was for. It turns out that while calling for a spawn from anywhere within a multi-node system to initiate a spawn anywhere else within a multi-node system is possible, it just isn't a very useful way of doing business, and so the gen_* behaviors and even proc_lib:start_link/N doesn't have a method of declaring the node on which to spawn a new process.
What you ideally want is nodes that know how to initialize themselves and join the cluster once they are running. Whatever services your system provides are usually best replicated on the other nodes within the cluster, and then you only have to write a way of picking a node, which allows you to curry out the business of startup entirely as it is now node-local in every case. In this case whatever your ordinary manager/supervisor/worker code does doesn't have to change -- stuff just happens, and it doesn't matter that the requestor's PID happens to be on another node, even if that PID is the address to which results must be returned.
Stated another way, we don't really want to spawn workers on arbitrary nodes, what we really want to do is step up to a higher level and request that some work get done by another node and not really care about how that happens. Remember, to spawn a particular function based on an {M,F,A} call the node you are calling must have access to the target module and function -- if it has a copy of the code already, why isn't it a duplicate of the calling node?
Hopefully this answer explained more than it confused.

Erlang's FSM code_change function usage

I'm learning Erlang and I've learned about hot code loading, but I don't know how the gen_fst behavior's code_change function works. I also can't find any example of it.
Should I create an action like so:
upgrade() ->
gen_fsm:send_event(machine_name, upgrade).
And have a handler in the states like so:
some_state(upgrade, State) ->
code:purge(?MODULE),
compile:file(?MODULE),
code:load_file(?MODULE),
{next_state, some_state, State, 1000}.
I've tried this, but the code_change/4 function doesn't execute. How should I correctly implement hot code loading in my FSM?
The code_change function calls are orchestrated by the supervisors when doing a relup. The normal hot code loading is the lowlevel functionality needed for this.
If you only want to replace one module and don't need to upgrade your state you can often just load the new module from the shell (no need for the purge, it can actually be harmful).
There are up to two versions of each module running. When doing local calls (without a module name in the call) you'll stay in the old version. However when you do a qualified call (with module name) you'll always jump to the new code.
loop(S) ->
do_processing(S), % stays in the same version
?MODULE:loop(S). % always jumps to the newest version
For more complicated hot upgrades you need appups and relups which are a advanced topic, best intro is in LYSE

Why does Erlang have arity in its imports?

I find Erlang's module arity import /n where n is the number of arguments rather bizarre.
In Java and various other languages you can do something like:
import static com.stuff.Blah.myFunction;
Which will import all overloaded Blay.myFunction(..) regardless of parameters.
Besides I guess being explicit why did the language designers decide this was a good idea (I'm not trying to criticize the language... just curious)?
Does it have to do with code swapping?
Or does it have to do with hiding guard methods for recursion? If so why not allow arity on export but no need for arity on import?
Why would I want to be that explicit? That is import the two argument function but not the the three argument of myFunction?
You should be aware of what importing functions in Erlang really does. It is a pure textual transformation. If I do an -import(foo, [bar/1,baz/2]). it means that when I write a call like bar(5) or baz(a, 3) the compiler transforms these to foo:bar(5) and foo:baz(a, 3). That is all it does, nothing else. It doesn't check anything:
It doesn't check if the module foo contains the functions bar/1 or baz/2.
It doesn't even check if the module foo exists.
Really all it does is hide that you are calling a function in another module. That is why the recommendation from experienced Erlangers is "don't use it". It was a mistake. Unfortunately it is much easier to add stupid things than to get rid of them so we were never able to remove it.
"Does it have to do with code swapping?"
Yes, sort of. The unit of all code handling in Erlang is the module. So you compile modules, load modules, purge and delete modules. This means that there are no inter-module dependencies at all in the system and the compiler makes no assumptions about other modules when it is compiling a module. No assumptions are made that the environment in which a module is compiled will be the same in which it is run. That is why it is at runtime the system checks whether the function you are trying to call in another exists, or even if the module itself exists. That is why the import was a purely textual transformation.
Erlang was originally developed in Prolog.
In Prolog, the arity adds additional meaning to what you consider to be the 'arguments, as I understand from a function' in a procedural programming language. But that model does not apply here.
The so-called clauses 'married(X,Y).' and 'married(X,Y,Z).' imply a different kind of relationship 'married', which can be declared as married/2 and married/3.
In procedural programming, 'add(a,b)' or 'add(a,b,c)' are intended to generate the addition of a different number of arguments. That's not immediately the case in Prolog, where it is possible to have the relationship 'a and b, added' or 'a, b and c, added' mean something else. Needless to say, Prolog allows you to declare 'add' as you would expect a function would do. But it allows for more. More available meaning, means more need to control it.
And as in any module system, selecting what you want to expose to external clients makes sense: hence the declaration of arity.
Does it have to do with code swapping?
Kind of. The modules in Erlang are compiled separately (which is part of what allows code swapping), unlike Java classes, so the compiler doesn't know how many versions of the imported function with different arities exist. It could assume that all calls of a function with the given name come from the same module, of course, but the designers likely decided it wasn't particularly useful.
In fact, you rarely want to use imports at all, at least in my experience, just as you rarely use static imports in Java. Just write module:function, like Class.staticMethod.
Or does it have to do with hiding guard methods for recursion?
No, since not importing functions doesn't hide them in any way.

Completely confused about MapReduce in Riak + Erlang's riakc client

The main thing I'm confused about here (I think) is what the arguments to the qfun are supposed to be and what the return value should be. The README basically doesn't say anything about this and the example it gives throws away the second and third args.
Right now I'm only trying to understand the arguments and not using Riak for anything practical. Eventually I'll be trying to rebuild our (slow, MySQL-based) financial reporting system with it. So ignoring the pointlessness of my goal here, why does the following give me a badfun exception?
The data is just tuples (pairs) of Names and Ages, with the keys being the name. I'm not doing any conversion to JSON or such before inserting the data from the Erlang console.
Now with some {Name, Age} pairs stored in <<"people">> I want to use MapReduce (for no other reason than to understand "how") to get the values back out, unchanged in this first use.
riakc_pb_socket:mapred(
Pid, <<"people">>,
[{map, {qfun, fun(Obj, _, _) -> [Obj] end}, none, true}]).
This just gives me a badfun, however:
{error,<<"{\"phase\":0,\"error\":\"{badfun,#Fun<erl_eval.18.17052888>}\",\"input\":\"{ok,{r_object,<<\\\"people\\\">>,<<\\\"elaine\\\">"...>>}
How do I just pass the data through my map function unchanged? Is there any better documentation of the Erlang client than what is in the README? That README seems to assume you already know what the inputs are.
There are 2 Riak Erlang clients that serve different purposes.
The first one is the internal Riak client that is included in the riak_kv module (riak_client.erl and riak_object.erl). This can be used if you are attached to the Riak console or if you are writing a MapReduce function or a commit hook. As it is run from within a Riak node it works quite well with qfuns.
The other client is the official Riak client for Erlang that is used by external applications and connects to Riak through the protocol buffers interface. This is what you are using in your example above. As this connects through protocol buffers, it is usually recommended that MapReduce functions in Erlang are compiled and deployed on the nodes of the cluster as named functions. This will also make them accessible from other client libraries.
I think my code is actually correct and my problem lies in the fact I'm trying to use the shell to execute the code. I need to actually compile the code before it can be run in Riak. This is a limitation of the Erlang shell and the way it compiles funs.
After a few days of playing around, here's a neat trick that makes development easier. Exploit Erlang's RPC support and the fact it has runtime code loading, to distribute your code across all the Riak nodes:
%% Call this somewhere during your app's initialization routine.
%% Assumes you have a list of available Riak nodes in your app's env.
load_mapreduce_in_riak() ->
load_mapreduce_in_riak(application:get_env(app_name, riak_nodes, [])).
load_mapreduce_in_riak([]) ->
ok;
load_mapreduce_in_riak([{Node, Cookie}|Tail]) ->
erlang:set_cookie(Node, Cookie),
case net_adm:ping(Node) of
pong ->
{Mod, Bin, Path} = code:get_object_code(app_name_mapreduce),
rpc:call(Node, code, load_binary, [Mod, Path, Bin]);
pang ->
io:format("Riak node ~p down! (ping <-> pang)~n", [Node])
end,
load_mapreduce_in_riak(Tail).
Now you can refer to any of the functions in the module app_name_mapreduce and they'll be visible to the Riak cluster. The code can be removed again with code:delete/1, if needed.

Erlang tracing (collect data from processes only in my modules)

While tracing my modules using dbg, I encountered with the problem how to collect messages such as spawn, exit, register, unregister, link, unlink, getting_linked, getting_unlinked, which are allowed by erlang:trace, but only for those processes which were spawned from my modules directly?
As an examle I don't need to know which processes io module create, when i call io:format in some module function. Does anybody know how to solve this problem?
Short answer:
one way is to look at call messages followed by spawn messages.
Long answer:
I'm not an expert on dbg. The reason is that I've been using an (imho much better, safer and even handier) alternative: pan , from https://gist.github.com/gebi/jungerl/tree/master/lib/pan
The API is summarized in the html doc.
With pan:start you can trace specifying a callback module that receives all the trace messages. Then your callback module can process them, e.g. keep track of processes in ETS or a state data that is passed into every call.
The format of the trace messages is specified under pan:scan.
For examples of callback modules, you may look at src/cb_*.erl.
Now to your question:
With pan you can trace on process handling and calls in your favourit module like this:
pan:start({ip, CallbackModule}, Node, all, [procs,call], {Module}).
where Module is the name of your module (in this case: sptest)
Then the callback module (in this case: cb_write) can look at the spawn messages that follow a call message within the same process, e.g.:
32 - {call,{<6761.194.0>,{'fun',{shell,<node>}}},{sptest,run,[[97,97,97]]},{1332,247999,200771}}
33 - {spawn,{<6761.194.0>,{'fun',{shell,<node>}}},{{<6761.197.0>,{io,fwrite,2}},{io,fwrite,[[77,101,115,115,97,103,101,58,32,126,115,126,110],[[97,97,97]]]}},{1332,247999,200805}}
As pan is also using the same tracing back end as dbg, the trace messages (and the information) can be collected using the Erlang trace BIF-s as well, but pan is much more secure.

Resources