I'm a little confused about clustering with gproc as a pubsub.
I would like to hold client sessions with gproc... It works great with one node.
However, I need to cluster the whole system.
It seems (as far as I undertand) that gproc has 2 way working with clusters, setting it as global, or using gproc_dist which seems to be a gen_leader behaviour.
Am I right so far ?
What would be the downside of each method ? (Still assuming I undertood correctly)
I think the global is implemented by gen_leader. so they are the same method.
the gproc_dist_tests.erl already give the sample code for explanation.
dist_test_() ->
{timeout, 120,
[{setup,
fun() ->
Ns = start_slaves([dist_test_n1, dist_test_n2]),
?assertMatch({[ok,ok],[]},
rpc:multicall(Ns, application, set_env,
[gproc, gproc_dist, Ns])),
?assertMatch({[ok,ok],[]},
rpc:multicall(Ns, application, start, [gproc])),
Ns
end,
fun(Ns) ->
[rpc:call(N, init, stop, []) || N <- Ns]
end,
fun(Ns) ->
{inorder,
[
{inparallel, [
fun() ->
?debugVal(t_simple_reg(Ns))
end,
fun() ->
?debugVal(t_simple_counter(Ns))
end,
fun() ->
?debugVal(t_aggr_counter(Ns))
end,
fun() ->
?debugVal(t_update_counters(Ns))
end,
fun() ->
?debugVal(t_shared_counter(Ns))
end,
fun() ->
?debugVal(t_mreg(Ns))
end,
fun() ->
?debugVal(t_await_reg(Ns))
end,
fun() ->
?debugVal(t_await_self(Ns))
end,
fun() ->
?debugVal(t_await_reg_exists(Ns))
end,
fun() ->
?debugVal(t_give_away(Ns))
end,
fun() ->
?debugVal(t_sync(Ns))
end,
fun() ->
?debugVal(t_monitor(Ns))
end,
fun() ->
?debugVal(t_subscribe(Ns))
end
]
},
fun() ->
?debugVal(t_sync_cand_dies(Ns))
end,
{timeout, 90, [fun() ->
?debugVal(t_fail_node(Ns))
end]}
]}
end
}]}.
I think gproc is created for solving the erlang's process register restriction. That's "only atom not tuple" can be used as registered key and one and only one process can be registered for one registered key.
The erlang has already provided mnesia db etc for solving client session data.
If the client session data throughout is high, I think it is not good idea to use gproc to handle it, for it will delay the process registration.
Related
I am looking at how to code "map reduce" type scenarios directly in erlang. As a toy example, imagine I want to decide which of several files is the biggest. Those files might be anywhere on the internet, so getting each one might take some time; so I'd like to gather them in parallel. Once I have them all, I can compare their sizes.
My assumed approach is as follows:
A 'main' process to co-ordinate the work and determine which is biggest;
A 'worker' process for each file, which fetches the file and returns the size to the main process.
Here's a clunky but functioning example (using local files only, but it shows the intent):
-module(cmp).
-export([cmp/2]).
cmp(Fname1, Fname2) ->
Pid1 = fsize(Fname1),
Pid2 = fsize(Fname2),
{Size1, Size2} = collect(Pid1, Pid2),
if
Size1 > Size2 ->
io:format("The first file is bigger~n");
Size2 > Size1 ->
io:format("The second file is bigger~n");
true ->
io:format("The files are the same size~n")
end.
fsize(Fname) ->
Pid = spawn(?MODULE, fsize, [self(), Fname]),
Pid.
fsize(Sender, Fname) ->
Size = filelib:file_size(Fname),
Sender ! {self(), Fname, Size}.
collect(Pid1, Pid2) ->
receive
{Pida, Fnamea, Sizea} ->
io:format("Pid: ~p, Fname: ~p, Size: ~p~n", [Pida, Fnamea, Sizea])
end,
receive
{Pidb, Fnameb, Sizeb} ->
io:format("Pid: ~p, Fname: ~p, Size: ~p~n", [Pidb, Fnameb, Sizeb])
end,
if
Pida =:= Pid1 -> {Sizea, Sizeb};
Pida =:= Pid2 -> {Sizeb, Sizea}
end.
Specific Questions
Is the approach idiomatic? i.e. hiving off each 'long running' task into a separate process, then collecting results back in a 'master'?
Is there a library to handle the synchronisation mechanics? Specifically, the collect function in the example above?
Thanks.
--
Note: I know the collect function in particular is clunky; it could be generalised by e.g. storing the pids in a list, and looping until all had completed.
In my opinion it's best to learn from an example, so I had a look at how they do that in otp/rpc and based on that I implemented a bit shorter/simpler version of the parallel eval call.
call(M, F, ArgL, Timeout) ->
ReplyTo = self(),
Keys = [spawn(fun() -> ReplyTo ! {self(), promise_reply, M:F(A)} end) || A <- ArgL],
Yield = fun(Key) ->
receive
{Key, promise_reply, {error, _R} = E} -> E;
{Key, promise_reply, {'EXIT', {error, _R} = E}} -> E;
{Key, promise_reply, {'EXIT', R}} -> {error, R};
{Key, promise_reply, R} -> R
after Timeout -> {error, timeout}
end
end,
[Yield(Key) || Key <- Keys].
I am not a MapReduce expert but I did had some experience using this 3rd party mapreduce module. So I will try to answer your question based on my current knowledge.
First, your input should be arranged as pairs of keys and values in order to properly use the mapreduce model. In general, your master process should first start workers processes (or nodes). Each worker receives a map function and a pair of key and value, lets name it {K1,V1}. It then executes the map function with the key and value and emits a new pair of key and value {K2,V2}. The master process collects the results and waits for all workers to finish their jobs. After all workers are done, the master starts the reduce part on the pairs {K2,List[V2]} that were emited by the workers. This part can be executed in parallel or not, it used to combine all the results into a single output. Note that the List[V2] is because there can be more then one value that was emited by the workers for a single K2 key.
From the 3rd party module I mentioned above:
%% Input = [{K1, V1}]
%% Map(K1, V1, Emit) -> Emit a stream of {K2,V2} tuples
%% Reduce(K2, List[V2], Emit) -> Emit a stream of {K2,V2} tuples
%% Returns a Map[K2,List[V2]]
If we look into Erlangs' lists functions, the map part is actually equal for doing lists:map/2 and the reduce part is in some way similar to lists:foldl/3 or lists:foldr/3 and the combination between them are: lists:mapfoldl/3, lists:mapfoldr/3.
If you are using this pattern of mapreduce using sets of keys and values, there is no need for special synchronization if that is what you mean. You just need to wait for all workers to finish their job.
I suggest you to go over the 3rd party module I mentioned above. Take also a look at this example. As you can see, the only things you need to define are the Map and Reduce functions.
What's correct Erlang way to have separated implementations from the contract and how to switch between them?
While others mention the behaviour feature, it is merely a small helper to make sure you implement all functions in a module for a callback structure. If you have two implementations, a and b, and both implements the same functions, you can just statically substitute a for b in a calling module. For static configuration where you have a better implementation, this is preferable.
If the question is of a more dynamic nature, you can just do
Mod = a,
Mod:f(Args).
And then in code set Mod appropriately. This lets you dynamically control what module to call while the program is running. It is not entirely clear which of the two you want.
Nice example of polymorphism is qlc module and structure table. See various M:table/1,2 implementations in ets, dets, mnesia and so. Try ets:table(ets:new(foo, [set])). in shell for example and look into qlc documentation and examples.
Since Erlang is dynamically typed, function guards (the when … -> bits) are the way to express polymorphism.
E.g:
len (T) when is_tuple(T) -> size(T);
len (L) when is_list(L) -> length(L).
Maybe have a look on behaviours concept. At least for me there is small similarity to OOP in terms of having interface definition and multiple implementation modules.
If I realized your question, here is example for approach, that pretty much works for me. This approach helps to separate interface and implementation.
"Interface" module.
-module(contract).
-export([
new/2,
do_something/2
]).
%% Behavioural callbacks definition. Each of "derived" modules should implement it.
-callback new(Arg :: any()) -> {ok, ImplState :: any()} | {error, Reason :: atom()}.
-callback do_something( Arg :: any(), ImplState :: any() ) -> {ok, ReturnVal :: any(), NewImplState :: any()} | {error, Reason :: atom()}.
%% Opaque state to hold implementation details
-record(
contract_impl, {
impl_module :: module(),
impl_state :: any()
}
).
%% Interface for creation "polymorphic" instance, like base-class constructor.
new(ImplModule, ImplInitArgs) ->
case ImplModule:new(ImplInitArgs) of
{ok, State} ->
{ok,
#contract_impl {
impl_module = ImplModule,
impl_state = State
}
};
{error, Reason} ->
{error, Reason}
end.
%% Interface function, like an abstract method in OOP.
do_something(
Arg,
#contract_impl {
impl_module = ImplModule,
impl_state = ImplState
} = ContractImpl
) ->
case ImplModule:do_something(Arg, ImplState) of
{ok, ReturnVal, NewState} ->
{ok, ReturnVal, ContractImpl#contract_impl{ impl_state = NewState }};
{error, Reason} -> {error, Reason}
end.
Some implementation example (like derived class).
-module(foo).
-behaviour(contract).
-export([
new/1,
do_something/2
]).
-record(
foo_state, {
repeat_count
}
).
new(Options) ->
{ok,
#foo_state{
repeat_count = proplists:get_value(repeat_count, Options)
}
}.
do_something(Arg, #foo_state{ repeat_count = RepeatCount } = State) ->
Result = [ io_lib:format("Foo ~p", [Arg]) || _Index <- lists:seq(1, RepeatCount) ],
{ok, Result, State}.
Now you can do the following:
usage_example() ->
{ok, State} = contract:new(foo, [{repeat_count, 15}]),
{ok, Result, NewState} = contract:do_something("bar", State),
ok.
I hope this helps.
I've been making a chat application in Erlang for a school project, but I've run into an issue. I'm trying to make my program concurrent and in order to do so I start a new thread for each message a channel is sending. I do this using lists:foreach, but I want to make sure that I don't message the person who typed in the channel.
request(State, {message, {UserID,UserPID}, Token}) ->
case catch lists:member({UserID,UserPID}, State#channel.users) of
false ->
{{error, user_not_joined}, State};
true ->
spawn( fun() ->
ListOfUsers = State#channel.users,
UserPIDs = lists:map(fun ({_, V}) -> V end, ListOfUsers),
%spawn( fun() ->end)
lists:foreach(
fun(UserPID) -> ok end,
fun(PID) ->
spawn( fun() -> genserver:request(PID, {message_from_server, State#channel.name, UserID, Token}) end)
end, UserPIDs) end),
{ok, State}
end;
What I pretty much want to do is to match with the UserPID inside the foreach. As of now I only get warnings:
channel.erl:39: Warning: variable 'UserPID' is unused
channel.erl:39: Warning: variable 'UserPID' shadowed in 'fun'
Line 3 is fun(UserPID) -> ok end,
Cheers
The answer by legoscia is fine, but I'd add that often list comprehension is simpler to use and simpler to read than lists:foreach. Note that list comprehension is able to ignore values based on clauses. Consider the following example:
-module(filter).
-export([do/0]).
do() ->
Values = lists:seq(1,10),
IgnoreThisValue = 5,
%% print all values unequal to IgnoreThisValue
[io:format("Value: ~b~n", [Value]) ||
Value <- Values, Value =/= IgnoreThisValue],
ok.
Run it in the shell:
1> make:all([load]).
Recompile: filter
up_to_date
2> filter:do().
Value: 1
Value: 2
Value: 3
Value: 4
Value: 6
Value: 7
Value: 8
Value: 9
Value: 10
A side note on your code: Why do you spawn a thread per process? I assume that you are using the behaviour gen_server (correct me if I am wrong). If so, you should consider using the cast function to simply send a message. As you do not check the result of genserver:request/2, this might be a viable option which saves you a lot of processes.
Since the function argument shadows the existing variable, you need to use a guard for that:
fun(PID) when PID =:= UserPID -> ok end
I am implementing robo-soccer which uses gen_fsm to represent strategy of each robot.
I have developed the strategy for attacker. Now, I want to use the same strategy for running multiple attackers, say 5 attackers. For this purpose, I have to copy the same gen_fsm code(200 lines) at 5 different modules resulting in total 1000 lines of code.
Is there a way to write a gen_fsm and reuse it in multiple modules?
A sample code snippet is
-module(planner).
-behaviour(gen_fsm).
start() ->
start_link().
start_link() ->
gen_fsm:start_link({local, ?SERVER}, ?MODULE, [], []).
init([]) ->
{ok, state1, ets:new(test,[public])}.
state1({test},State) ->
case ets:lookup(State,ball) of
[] ->
action!{turn,80},
{next_state,state1,State};
Data ->
% some code % %,
{next_state,state2,State}
end;
state2({test},State) ->
% --- some code ---%
You should probably start the gen_fsm process without a globally registered name. Try:
start_link() ->
gen_fsm:start_link(?MODULE, [], []).
Remember to keep track of your processes using the process id returned from gen_fsm:start_link/3.
Why not just spawn 5 processes, 1 for each attacker, that runs the same code?
Say I have some function fn1() in Erlang which returns {ok, Result} if the function was executed successfully and {error, "ErrorReason"} if there was an error.
Now in another function fn2() I call fn1() and I need to check the result of fn1 and proceed only if it is {ok, Result}.
I figured, I can do this using either case or try catch. But Efficiency is my main concern and I'd like to know which of the two methods below is more efficient:
try-catch Method
fn2() ->
try
{ok, Result} = fn1(),
%Do something with Result
ok
catch
throw:Term -> Term;
exit:Reason -> {exit, Reason};
error:Reason -> {error,{Reason,erlang:get_stacktrace()}}
end.
case Method
fn2() ->
Res = fn1(),
case Res of
{ok, Result} ->
%Do something with Result
ok;
{error, Reason} ->
Reason
end.
You really want to try and avoid try/catch like the plague. It is a very uncommon idiom in Erlang - really only used in a couple of special cases:
where you are checking user-supplied
input and you have no guarantees that
it will be 'correct'
where you have something which is
deeply nested and the cost of
unrolling it on an error condition
is too expensive
like mensia transactions
or in parser/lexer's
Try/catch is essential in languages like C++ where the application is unstable in the presence or errors, but Erlang is stable in those circumstances - the process crashes but doens't bring the system down.
You should programme the happy path, match return values and if the application deviates from what you expect then let it crash. The crash tells you you have a problem and tells you to fix it.
The problem with try/catch is that it can simply mask the problem, or even worse, move the eventual crash away from where it should happen (inside the expression you have wrapped) and make it appear elsewhere - where your programming logic expects it to have suceeded = which makes debugging much harder.
Programming the happy path and letting it crash is very disconcerting the first time you do it, it feels like going out with no clothes on, but actually you get used to it real quick :)
The case method will be more efficient, as it simply pattern matches, and does not involve building a call stack and stuff.
In both examples you are about to handle the "error" locally, so there is no point in the try catch.What you might see sometimes is something like:
fn2() ->
{ok, Result} = fn1(),
%Do stuff with Result
ok.
Here the intention is that you make fn2() throw a badmatch, if fn1() did not return ok. You let someone else "above" handle the problem. E.g. this might kill your process, and make your supervisor create a new one.
You should always measure to find things like this out.
Your code also does not do what you think it does.
-module(glurk).
-compile(export_all).
fn2() ->
try
{ok, Result} = fn1(),
%Do something with Result
ok
catch
throw:Term -> Term;
exit:Reason -> {exit, Reason};
error:Reason -> {error,{Reason,erlang:get_stacktrace()}}
end.
fn1() ->
{error, a}.
Try this out:
c(glurk).
./glurk.erl:6: Warning: variable 'Result' is unused
{ok,glurk}
16> glurk:fn2().
{error,{{badmatch,{error,a}},
[{glurk,fn2,0},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}}
This is because fn1 did not raise an exception
it gebnerated a normal retyurn value {error, a} which
does not pattern match against {ok, Result}
The first version of your code works with a function that either returns a normal value
or raises an exception - you have to write it like this:
fn1(....) ->
...
%% success case
Val;
%% failure case
throw(...) | exit(...) | error(...)
You can't just pump the same function into fn1 and fn2.
If you had the case where the called function had to escape from a deep recursion
then the first method would be more efficient than the second - since you could
immediately exit from a deep recursion by saying throw(...).
So the answer depends upon the nature of the function that you are calling.
Code should always be optimised for beauty and not efficiency - since you have
to maintain the stuff - then it should only be optimised in the rare cases
where it is not fast enough. What needs to be optimised should be identified
by measuring the program (you will always be surprised here :-)
Me, I'd write
{ok,Result} = ...
Actually your first code has a more subtle error
fn2() ->
try
{ok, Result} = fn1(),
%Do something with Result
ok
catch
throw:Term -> Term;
exit:Reason -> {exit, Reason};
error:Reason -> {error,{Reason,erlang:get_stacktrace()}}
end.
Think about this. The caught error cases do not themselves handle the error
they just return tuples like {exit, Reason} or {error, Reason} this means that the
next layer up (ie the caller of fn2) will also have to mess around checking
error returns - if this is repeated at all levels the code will be a mess.
The "erlang" way is to have one try-catch at the top of the program and just terminate
abruptly with exit(Why) if an error occurs.
In fact often you should not even do this - you should link your process to another process
then the offending process will die and "the other processes will fix the error".
The exception propagates up the call stack and flies over to the linked processes
for treatment. So we have two types of processes - ones that have no inbuilt error handling
and processes that only do error handling.
In this case, irrespective of what is more efficient, you should definitely use the case alternative as it more succinctly describes what is going on. Your fn1() here return a value indicating if there is a successful value or an error. In the case version you directly match against this, while in the try version you match against the success value which will generate an error if an error was returned. It is unnecessarily convoluted and hides what is going on so it is bad programming style and should be avoided.
And as Gordon has already pointed out having a try there will catch more errors than you probably intend and may so mask other real errors which you should see.
The case will also be faster here, but the difference is probably small. The clarity, succinctness and good programming style is much more important!
The case will always be more efficient, but the difference is small and it's more important what makes more sense in your particular case. Especially, which approach produces more understandable code. See this benchmark:
%% Results:
%% 7> errors:run_normal(100).
%% {9,ok}
%% 8> errors:run_normal(1000).
%% {107,ok}
%% 9> errors:run_normal(10000).
%% {856,ok}
%% 10> errors:run_normal(1000, 10).
%% {263,ok}
%% 11> errors:run_wcatch(10000).
%% {2379,ok}
%% 12> errors:run_wcatch(1000, 10).
%% {401,ok}
%% 18> errors:run_normal_cplx(10000, 50).
%% {7910,ok}
%% 19> errors:run_wcatch_cplx(10000, 50).
%% {10222,ok}
-module(errors).
-compile(export_all).
run_normal(Iterations) ->
get_result(Iterations, fun() -> normal() end).
run_normal(Iterations, Level) ->
get_result(Iterations, fun() -> deepnormal(Level) end).
run_wcatch(Iterations) ->
get_result(Iterations, fun() -> wcatch() end).
run_wcatch(Iterations, Level) ->
get_result(Iterations, fun() -> deepwcatch(Level) end).
run_normal_cplx(Iterations) ->
get_result(Iterations, fun() -> normal_complex() end).
run_normal_cplx(Iterations, Level) ->
get_result(Iterations, fun() -> deepnormal_complex(Level) end).
run_wcatch_cplx(Iterations) ->
get_result(Iterations, fun() -> wcatch_complex() end).
run_wcatch_cplx(Iterations, Level) ->
get_result(Iterations, fun() -> deepwcatch_complex(Level) end).
%%------------------------------------------------------------------------------
get_result(Iterations, Fun) ->
timer:tc(fun() -> run(Iterations, Fun) end).
run(0, _Fun) ->
ok;
run(Iterations, Fun) ->
Fun(),
run(Iterations - 1, Fun).
%%------------------------------------------------------------------------------
normal() ->
case foo(atom) of
{ok, atom} -> ok;
{error, atom} -> ok
end.
normal_complex() ->
case foo_cplx() of
{ok, Res} -> Res;
{error, Res} -> Res
end.
deepnormal(Level) ->
case deepfoo(atom, Level) of
{ok, atom} -> ok;
{error, atom} -> ok
end.
deepnormal_complex(Level) ->
case deepfoo_cplx(Level) of
{ok, Res} -> Res;
{error, Res} -> Res
end.
wcatch() ->
try
{ok, atom} = foothrow(atom)
catch
throw:{error, atom} -> ok
end.
wcatch_complex() ->
try
{ok, _Res} = foothrow_cplx()
catch
throw:{error, Res} -> Res
end.
deepwcatch(Level) ->
try
{ok, atom} = deepfoothrow(atom, Level)
catch
throw:{error, atom} -> ok
end.
deepwcatch_complex(Level) ->
try
{ok, _Res} = deepfoothrow_cplx(Level)
catch
throw:{error, Res} -> Res
end.
%%------------------------------------------------------------------------------
foo(Arg) -> {error, Arg}.
foothrow(Arg) -> throw({error, Arg}).
deepfoo(Arg, 0) -> {error, Arg};
deepfoo(Arg, Level) -> deepfoo(Arg, Level - 1).
deepfoothrow(Arg, 0) -> throw({error, Arg});
deepfoothrow(Arg, Level) -> deepfoothrow(Arg, Level - 1).
foo_cplx() -> {error, {<<"Some">>, "Complex", data}}.
foothrow_cplx() -> throw({error, {<<"Some">>, "Complex", data}}).
deepfoo_cplx(0) -> {error, {<<"Some">>, "Complex", data}};
deepfoo_cplx(Level) -> deepfoo_cplx(Level - 1).
deepfoothrow_cplx(0) -> throw({error, {<<"Some">>, "Complex", data}});
deepfoothrow_cplx(Level) -> deepfoothrow_cplx(Level - 1).