Completely confused about MapReduce in Riak + Erlang's riakc client - erlang

The main thing I'm confused about here (I think) is what the arguments to the qfun are supposed to be and what the return value should be. The README basically doesn't say anything about this and the example it gives throws away the second and third args.
Right now I'm only trying to understand the arguments and not using Riak for anything practical. Eventually I'll be trying to rebuild our (slow, MySQL-based) financial reporting system with it. So ignoring the pointlessness of my goal here, why does the following give me a badfun exception?
The data is just tuples (pairs) of Names and Ages, with the keys being the name. I'm not doing any conversion to JSON or such before inserting the data from the Erlang console.
Now with some {Name, Age} pairs stored in <<"people">> I want to use MapReduce (for no other reason than to understand "how") to get the values back out, unchanged in this first use.
riakc_pb_socket:mapred(
Pid, <<"people">>,
[{map, {qfun, fun(Obj, _, _) -> [Obj] end}, none, true}]).
This just gives me a badfun, however:
{error,<<"{\"phase\":0,\"error\":\"{badfun,#Fun<erl_eval.18.17052888>}\",\"input\":\"{ok,{r_object,<<\\\"people\\\">>,<<\\\"elaine\\\">"...>>}
How do I just pass the data through my map function unchanged? Is there any better documentation of the Erlang client than what is in the README? That README seems to assume you already know what the inputs are.

There are 2 Riak Erlang clients that serve different purposes.
The first one is the internal Riak client that is included in the riak_kv module (riak_client.erl and riak_object.erl). This can be used if you are attached to the Riak console or if you are writing a MapReduce function or a commit hook. As it is run from within a Riak node it works quite well with qfuns.
The other client is the official Riak client for Erlang that is used by external applications and connects to Riak through the protocol buffers interface. This is what you are using in your example above. As this connects through protocol buffers, it is usually recommended that MapReduce functions in Erlang are compiled and deployed on the nodes of the cluster as named functions. This will also make them accessible from other client libraries.

I think my code is actually correct and my problem lies in the fact I'm trying to use the shell to execute the code. I need to actually compile the code before it can be run in Riak. This is a limitation of the Erlang shell and the way it compiles funs.

After a few days of playing around, here's a neat trick that makes development easier. Exploit Erlang's RPC support and the fact it has runtime code loading, to distribute your code across all the Riak nodes:
%% Call this somewhere during your app's initialization routine.
%% Assumes you have a list of available Riak nodes in your app's env.
load_mapreduce_in_riak() ->
load_mapreduce_in_riak(application:get_env(app_name, riak_nodes, [])).
load_mapreduce_in_riak([]) ->
ok;
load_mapreduce_in_riak([{Node, Cookie}|Tail]) ->
erlang:set_cookie(Node, Cookie),
case net_adm:ping(Node) of
pong ->
{Mod, Bin, Path} = code:get_object_code(app_name_mapreduce),
rpc:call(Node, code, load_binary, [Mod, Path, Bin]);
pang ->
io:format("Riak node ~p down! (ping <-> pang)~n", [Node])
end,
load_mapreduce_in_riak(Tail).
Now you can refer to any of the functions in the module app_name_mapreduce and they'll be visible to the Riak cluster. The code can be removed again with code:delete/1, if needed.

Related

Name specification to use when starting a gen_server in erlang by {via,RegMod,ViaName}

In erlang gen_server, it says
{via,RegMod,ViaName}
Register the gen_server process with the registry represented by RegMod.
The RegMod callback is to export the functions register_name/2, unregister_name/1,
whereis_name/1, and send/2, which are to behave like the corresponding functions in global.
Thus, {via,global,GlobalName} is a valid reference equivalent to {global,GlobalName}.
But I cannot find any example about it, can anyone show an example about {via,RegMod,ViaName}?

Erlang. Question about the difference of ?SERVER and ? MODULE macros

In all samples of gen_server implementations I've saw the ?SERVER is assigned to ?MODULE.
Look down here:
-define(SERVER, ?MODULE).
...
gen_server:start_link({local, ?SERVER}, ?MODULE, [], [])
The idea, I have clued is to run many server processes with different names but implemented in one module.
But, when I tried to run server with the name different from module name in my experiments, I always got errors.
Can, please, somebody explain me this subtlety.
The code you show does not and cannot implement multiple servers with different names, since the server name is defined as the same as the module name. So if you try with this code to get multiple servers implemented in one module your attempts will fail.
The reason for introducing separate SERVER macro with the same value as MODULE is to make things more explicit. In start_link call the two macros may have the same value, but they serve different purposes, so it is clearer to use two instead of one.

Erlang how to pass parameter from module to another

I have an application written in erlang, i added a supervisor for distribution and now after parsing the configFile.cfg in the supervisor, i want to pass the configuration to my old application.
i have something like this now:
-module(supervisor_sup).
start() ->
application_app:start().
what i want is:
-module(supervisor_sup).
-record(config,{param1,param2}).
%After parsing the configFile.cfg
Conf = #config{param1 = Param1,
param2 = Param2},
start(Conf) ->
application_app:start(Conf).
It is uncommon to start applications from supervisors or modules under supervisors. The preferred way is to use application dependency to make sure all applications are started, and in the right order.
However, if you want some configuration to be available from several different applications without having to parse the configuration more than once, maybe the gproc library is what you are looking for?
https://github.com/uwiger/gproc
gproc can be used to efficiently set global configuration and much more. Even in a distributed environment.

How do I build a DNS Query record in Erlang?

I am building a native Bonjour / Zeroconf library and need to build DNS query records to broadcast off to the other machines. I have tried looking thru the Erlang source code but as I am relatively new to Erlang it gets kind of dense down the bowels of all the inet_XXX.erl and .hrl files. I have a listener that works for receiving and parsing the DNS record payloads, I just can't figure out how to create the query records. What I really need to know is what I need to pass into inet_dns:encode() to get a binary I can send out. Here is what I am trying to do.
{ok,P} = inet_dns:encode(#dns_query{domain="_daap._tcp.local",type=ptr,class=in})
here is the error I am getting
10> test:send().
** exception error: {badrecord,dns_rec}
in function inet_dns:encode/1
in call from test:send/0
11>
I finally figured it out.
send(Domain) ->
{ok,S} = gen_udp:open(5555,[{reuseaddr,true}, {ip,{224,0,0,251}}, {multicast_ttl,4}, {multicast_loop,false}, {broadcast,true}, binary]),
P = #dns_rec{header=#dns_header{},qdlist=[#dns_query{domain=Domain,type=ptr,class=in}]},
gen_udp:send(S,{224,0,0,251},5353,inet_dns:encode(P)),
gen_udp:close(S).
The fact that there is no documentation for the inet_dns module should make you very wary of using it from your code. I hope you are fully aware that no consideration will be taken to your project if they (the OTP team) feel like changing how the module is implemented and used.
Read the code for implementation ideas, or just get down to creating the DNS protocol message using the Erlang bit syntax based on the RFCs on the DNS protocol. Creating a DNS package is much easier than parsing it (I've been down that road myself, and the "clever tricks" to minimize packet size hardly seem worth it).
As explained by Magnus in the Erlang Questions Mailing list:
http://groups.google.com/group/erlang-programming/browse_thread/thread/ce547dab981219df/47c3ca96b15092e0?show_docid=47c3ca96b15092e0
you were passing a dns_query instead of a dns_rec record in the encode/1 function.

How do I send a module to an Erlang node?

I have several nodes running in an erlang cluster, each using the same magic cookie and trusted by each other. I want to have one master node send code and modules to the other nodes. How can I do this?
use nl(module_name). to load code on all the nodes.
Check out my etest project for an example of programmatically injecting a set of modules on all nodes and then starting it.
The core of this is pretty much the following code:
{Mod, Bin, File} = code:get_object_code(Mod),
{_Replies, _} = rpc:multicall(Nodes, code, load_binary,
[Mod, File, Bin]),

Resources