Erlang: check duplicate inserted elements - erlang

I want to know if inserted elements are duplicated.
Here is simple example for what I'm looking for :
In first run should return false.
check_duplicate("user", "hi").
But in second run should return true.
check_duplicate("user", "hi").

One of best features of functional programming is pure functions. There are even functional languages like Haskell where you can't write an impure function. A pure function always returns the same value for the same argument. An impure function has side effect and can return different result for the same argument. It means there has to change some state which you can't see as an argument to the function. You are asking just for it. Erlang allows you to do it. You have many options how to do it. The cleanest is to send a message and receive a message from another process. (It's impure anyway, but idiomatic in Erlang. The following code is very simple and not ready for production use. You should use OTP behaviours and design principles for it.)
has_dupes(Jid, Text) ->
Ref = make_ref(),
seen ! {Ref, self(), {Jid, Text}},
receive {Ref, Result} -> Result end.
start_seen() ->
spawn(fun()-> register(seen, self()), loop_seen([]) end).
loop_seen(Seen) ->
receive {Ref, From, Term} ->
case lists:member(Term, Seen) of
true ->
From ! {Ref, true},
loop_seen(Seen);
false ->
From ! {Ref, false},
loop_seen([Term|Seen])
end
end.
The other is to store and read from ets (Erlang Term Storage).
has_dupes(Jid, Text) ->
(catch ets:new(seen, [set, named_table])),
not ets:insert_new(seen, {{Jid, Text}}).
But there is a catch. The table is owned by the process and is deleted when the process dies. Its name is global and so on. Another one and much more dirty is to store and read a value from process dictionary.
has_dupes(Jid, Text) ->
case get({Jid, Text}) of
undefined ->
put({Jid, Text}, seen),
false;
seen ->
true
end.
But it is nasty and you should almost never use code like this. In most cases you should use explicit state
new_seen() -> [].
has_dupes(Jid, Text, Seen) ->
Term = {Jid, Text},
case lists:member(Term, Seen) of
true -> {true, Seen};
false -> {false, [Term|Seen]}
end.
It is most time best solution because it is a pure function. You can use better data structures like sets and maps for better performance when you need to watch a bigger amount of terms.

Related

idiomatic process synchronisation in Erlang

I am looking at how to code "map reduce" type scenarios directly in erlang. As a toy example, imagine I want to decide which of several files is the biggest. Those files might be anywhere on the internet, so getting each one might take some time; so I'd like to gather them in parallel. Once I have them all, I can compare their sizes.
My assumed approach is as follows:
A 'main' process to co-ordinate the work and determine which is biggest;
A 'worker' process for each file, which fetches the file and returns the size to the main process.
Here's a clunky but functioning example (using local files only, but it shows the intent):
-module(cmp).
-export([cmp/2]).
cmp(Fname1, Fname2) ->
Pid1 = fsize(Fname1),
Pid2 = fsize(Fname2),
{Size1, Size2} = collect(Pid1, Pid2),
if
Size1 > Size2 ->
io:format("The first file is bigger~n");
Size2 > Size1 ->
io:format("The second file is bigger~n");
true ->
io:format("The files are the same size~n")
end.
fsize(Fname) ->
Pid = spawn(?MODULE, fsize, [self(), Fname]),
Pid.
fsize(Sender, Fname) ->
Size = filelib:file_size(Fname),
Sender ! {self(), Fname, Size}.
collect(Pid1, Pid2) ->
receive
{Pida, Fnamea, Sizea} ->
io:format("Pid: ~p, Fname: ~p, Size: ~p~n", [Pida, Fnamea, Sizea])
end,
receive
{Pidb, Fnameb, Sizeb} ->
io:format("Pid: ~p, Fname: ~p, Size: ~p~n", [Pidb, Fnameb, Sizeb])
end,
if
Pida =:= Pid1 -> {Sizea, Sizeb};
Pida =:= Pid2 -> {Sizeb, Sizea}
end.
Specific Questions
Is the approach idiomatic? i.e. hiving off each 'long running' task into a separate process, then collecting results back in a 'master'?
Is there a library to handle the synchronisation mechanics? Specifically, the collect function in the example above?
Thanks.
--
Note: I know the collect function in particular is clunky; it could be generalised by e.g. storing the pids in a list, and looping until all had completed.
In my opinion it's best to learn from an example, so I had a look at how they do that in otp/rpc and based on that I implemented a bit shorter/simpler version of the parallel eval call.
call(M, F, ArgL, Timeout) ->
ReplyTo = self(),
Keys = [spawn(fun() -> ReplyTo ! {self(), promise_reply, M:F(A)} end) || A <- ArgL],
Yield = fun(Key) ->
receive
{Key, promise_reply, {error, _R} = E} -> E;
{Key, promise_reply, {'EXIT', {error, _R} = E}} -> E;
{Key, promise_reply, {'EXIT', R}} -> {error, R};
{Key, promise_reply, R} -> R
after Timeout -> {error, timeout}
end
end,
[Yield(Key) || Key <- Keys].
I am not a MapReduce expert but I did had some experience using this 3rd party mapreduce module. So I will try to answer your question based on my current knowledge.
First, your input should be arranged as pairs of keys and values in order to properly use the mapreduce model. In general, your master process should first start workers processes (or nodes). Each worker receives a map function and a pair of key and value, lets name it {K1,V1}. It then executes the map function with the key and value and emits a new pair of key and value {K2,V2}. The master process collects the results and waits for all workers to finish their jobs. After all workers are done, the master starts the reduce part on the pairs {K2,List[V2]} that were emited by the workers. This part can be executed in parallel or not, it used to combine all the results into a single output. Note that the List[V2] is because there can be more then one value that was emited by the workers for a single K2 key.
From the 3rd party module I mentioned above:
%% Input = [{K1, V1}]
%% Map(K1, V1, Emit) -> Emit a stream of {K2,V2} tuples
%% Reduce(K2, List[V2], Emit) -> Emit a stream of {K2,V2} tuples
%% Returns a Map[K2,List[V2]]
If we look into Erlangs' lists functions, the map part is actually equal for doing lists:map/2 and the reduce part is in some way similar to lists:foldl/3 or lists:foldr/3 and the combination between them are: lists:mapfoldl/3, lists:mapfoldr/3.
If you are using this pattern of mapreduce using sets of keys and values, there is no need for special synchronization if that is what you mean. You just need to wait for all workers to finish their job.
I suggest you to go over the 3rd party module I mentioned above. Take also a look at this example. As you can see, the only things you need to define are the Map and Reduce functions.

How to check if a list is empty in Erlang?

Basically I have a structure that includes a Value and a list of Ids.
What I want to do is map over the list of Ids and send a message to them but when I first initialize the list of Ids I put the variable "empty_set".(Maybe I should rename it to empty_list :P).
The problem is that whenever I call the map function I want to check first if the list is an "empty_set" and if not then use the map function in it. Here is the code:
{From, set_value, V} ->
if ViewerSet /= empty_set -> set_viewer_values(V, ViewerSet)
end,
looper(V, ViewerSet)
This is the function that is called:
set_viewer_values(Value, ViewerSet) ->
if ViewerSet /= empty_set ->
lists:map(fun(ViewerPid) ->
ViewerPid ! {self(), set_value, Value} end, ViewerSet)
end.
This is how I initiate the process:
process() ->
C = spawn(fun() -> looper(no_value, empty_set) end),
{ok, C}.
The problem is that when I run it I get this error:
=ERROR REPORT==== 2-Nov-2014::15:03:07 ===
Error in process <0.367.0> with exit value: {function_clause,[{lists,map,
[#Fun<sheet.2.12938396>,empty_set],[{file,"lists.erl"},{line,1223}]},{lists,map,2,
[{file,"lists.erl"},{line,1224}]},{sheet,cell_loop,2,[{file,"sheet.erl"},{line,93}]}]}
From what I understand is that despite the if expression that I have to check whether or not the list is empty, it still tries to map over it.
So what am I doing wrong with the expression?
Thanks
Pattern matching. If you need to check for an empty list in a guard or if or cond its almost certain that you have a structural problem with the way you are thinking about Erlang.
This will nearly always manifest in confusing code and weird edge cases that make you ask yourself things like "How do I check for an empty list?" without realizing that what you are really asking is "How do I check for an empty list as a procedural condition?" This is the bane of sane functional programming.
Edit: A touch more explanation and an example may be in order
Wherever you would want to inject pattern matching you can either use something like a case or you can break whatever you are doing out into a separate function. Very often what you find is that you've got a semantic ambiguity where things are too tightly coupled on the one hand (you're doing work other than receipt of messages within a receive) and too loosely on the other (you're engaging in a lot of arbitrary procedural checking prior to calling a function, when really matching on parameters is the natural solution).
looper(V, ViewerSet) ->
receive
{From, set_value, V} ->
set_viewer_values(V, ViewerSet),
looper(V, ViewerSet);
% OtherStuff ->
% whatever else looper/2 does...
end.
set_viewer_values(V, []) ->
set_default_values(V);
set_viewer_values(V, ViewerSet) ->
% ... whatever the normal function definition is...
Wherever you are dispatching to from within your receive is what should be doing the actual work, and that is also the place you want to be doing matching. Since that is a function-call away anyway matching here is a good fit and simplifies your code.
If you want to match in looper/2 itself this is certainly possible. I don't know what you want to do when you receive an empty list, so I'll make up something, but you can do whatever you want:
looper(V, []) ->
looper(V, default_set());
looper(V, ViewerSet) ->
% As before, or whatever makes sense.
You could even decide that when you have an empty set you need to operate in a totally different way:
full_looper(V, []) ->
empty_looper(V);
full_looper(V, ViewerSet) ->
receive
{new_set, Set} ->
looper(V, Set);
{From, set_value, V} ->
set_viewer_values(V, ViewerSet),
looper(V, ViewerSet)
end.
empty_looper(V) ->
receive
{new_set, Set} ->
full_looper(V, Set);
{From, set_value, V} ->
set_viewer_values(V, default_set()),
empty_looper(V)
end.
My point above is that there are many ways to handle the case of having an empty set without resorting to arbitrary procedural checking, and all of them read easier once you know your way around (until you get used to doing things this way, though, it can feel pretty weird). As a side note, the last example is actually creating a finite state machine -- and there is an OTP module already to make creating FSMs really easy. (They are easy to write by hand in Erlang, too, but even easier with the gen_fsm module.)
Try Case to check when list is empty rather then recursion?
On both if expressions, what happens if ViewerSet is empty_set? There's no guard that handles this case.
if expressions in Erlang are not the typical if expressions you see in other languages. From the little experience I have, they are mostly avoided and for a good reason: (as another answer already mentioned) pattern matching can be used to check for equality and other comparison operations (through guards).
The following is taken from here:
If no guard sequence is true, an if_clause run-time error will occur. If necessary, the guard expression true can be used in the last branch, as that guard sequence is always true.
Example:
is_greater_than(X, Y) ->
if
X>Y ->
true;
true -> % works as an 'else' branch
false
end
So if expressions end up being a sort of case but with boolean values as their clauses, they tend to introduce more confusion than clarity. Some people even avoid any usage of if expression.
My suggestion is that everytime you see yourself using an if expression, ask yourself how you can replace that with pattern matching, either with a case or as part of a function clause.
If you have a list of ids in the variable ViewerSet, simply initialize it with the empty list: [].
Then when you receive the message {From, set_value, V} you can execute a function for each element of the list (even if it is empty) using lists:foreach/2 or using list comprehension:
{From, set_value, V} ->
lists:foreach(fun(ViewerPid) -> ViewerPid ! {self(), set_value, Value} end, ViewerSet),
looper(V, ViewerSet);
...
or
{From, set_value, V} ->
[fun(ViewerPid) -> ViewerPid ! {self(), set_value, Value} end || ViewerPid <- ViewerSet],
looper(V, ViewerSet);
...
Based on your code, this is what you should get:
(shell#a)8> Val.
myatom
(shell#a)9> if Val /= myatom -> lists:map(fun(X) -> io:format("~p",[X]) end, Val) end.
** exception error: no true branch found when evaluating an if expression
(shell#a)10>
So it seems the problem resides somewhere else.

Why does Erlang allow putting parentheses after a fun?

This question is about some syntax a partner came across today and though we understand how it works, we don't understand why is it allowed (what is its use?).
Look at this snippet:
fun() -> ok end().
Without the last pair of parentheses this will produce something like:
#Fun<erl_eval.20.82930912>
But with them, the function is evaluated producing:
ok
My question is, why is that syntax allowed in Erlang ? why would I want to create a function just to call it immediately instead of just writing out its contents? is there any practical use to it ?
The only thing we could think about was introducing local variables inside the fun's body (but that would look ugly and unclear to me).
Please note that this other syntax is not allowed in Erlang, even though it follows the same concept of the former:
fun() -> fun() -> ok end end()().
(It would mean: a function A that returns a function B. And I'm evaluating A (thus producing B) and then evaluating B to get 'ok').
The syntax you mentioned is a natural outcome of Erlang's being functional.
In Erlang, functions are values (stored as closures).
The value of fun() -> ok end is a function, which takes nothing and returns ok. When we put parentheses after it, we are calling that function. Another way to demonstrate this is:
> F = fun() -> ok end.
#Fun<erl_eval.20.80484245>
> F().
ok
The functions in the second example of yours need to be grouped properly in order for the parser to make sense of them.
As for your question -- "why this syntax is allowed", I'd have to say it's a natural outcome of functions being values in Erlang. This ability enables the functional style of programming. Here is an example:
> lists:map(fun(X) -> X * 2 end, [1,2,3]).
[2,4,6]
The above code is in essence this:
> [fun(X) -> X * 2 end(1), fun(X) -> X * 2 end(2), fun(X) -> X * 2 end(3)].
[2,4,6]
A "natural outcome" is just a natural outcome, it really doesn't have to be of any practical use. So, you will probably never see code like (fun() -> fun() -> ok end end())(). being used:)
You typically wont't have much use for the syntax fun() -> ok end (). But it can be useful to do something like (find_right_fun()) (), which is basically the same thing - an expression that evaluates to a function.
Note that the Erlang parser requires you to specify the precedence using () to sort out the meaning of ()(), i.e. your second example should be (fun() -> fun() -> ok end end()) ().

Dynamic pattern matching

How can I do dynamic pattern matching in Erlang?
Supose I have the function filter/2 :
filter(Pattern, Array)
where Pattern is a string with the pattern I want to match (e.g "{book, _ }" or "{ebook, _ }") typed by an user and Array is an array of heterogenous elements (e.g {dvd, "The Godfather" } , {book, "The Hitchhiker's Guide to the Galaxy" }, {dvd, "The Lord of Rings"}, etc) Then I would like filter/2 above to return the array of elements in Array that match Pattern.
I've tried some ideas with erl_eval without any sucess...
tks in advance.
With little bit documentation study:
Eval = fun(S) -> {ok, T, _} = erl_scan:string(S), {ok,[A]} = erl_parse:parse_exprs(T), {value, V, _} = erl_eval:expr(A,[]), V end,
FilterGen = fun(X) -> Eval(lists:flatten(["fun(",X,")->true;(_)->false end."])) end,
filter(FilterGen("{book, _}"), [{dvd, "The Godfather" } , {book, "The Hitchhiker's Guide to the Galaxy" }, {dvd, "The Lord of Rings"}]).
[{book,"The Hitchhiker's Guide to the Galaxy"}]
Is there any special reason why you want the pattern in a string?
Patterns as such don't exist in Erlang, they can really only occur in code. An alternative is to use the same conventions as with ETS match and select and write your own match function. It is really quite simple. The ETS convention uses a term to represent a pattern where the atoms '$1', '$2', etc are used as variables which can be bound and tested, and '_' is the don't care variable. So your example patterns would become:
{book,'_'}
{ebook,'_'}
{dvd,"The Godfather"}
This is probably the most efficient way of doing it. There is the possibility of using match specifications here but it would complicate the code. It depends on how complicated matching you need.
EDIT:
I add without comment code for part of the matcher:
%% match(Pattern, Value) -> {yes,Bindings} | no.
match(Pat, Val) ->
match(Pat, Val, orddict:new()).
match([H|T], [V|Vs], Bs0) ->
case match(H, V, Bs0) of
{yes,Bs1} -> match(T, Vs, Bs1);
no -> no
end;
match('_', _, Bs) -> {yes,Bs}; %Don't care variable
match(P, V, Bs) when is_atom(P) ->
case is_variable(P) of
true -> match_var(P, V, Bs); %Variable atom like '$1'
false ->
%% P just an atom.
if P =:= V -> {yes,Bs};
true -> no
end
end.
match_var(P, V, Bs) ->
case orddict:find(P, Bs) of
{ok,B} when B =:= V -> {yes,Bs};
{ok,_} -> no;
error -> {yes,orddict:store(P, V, Bs)}
end.
You can use lists:filter/2 to do the filtering part. Converting the string to code is a different matter. Are all the patterns in the form of {atom, _}? If so, you might be able to store the atom and pass that into the closure argument of lists:filter.
Several possibilities come to the mind, depending on how dynamic the patterns are and what features you need in your patterns:
If you need exactly the syntax of erlang patterns and the pattern doesnt't change very often. You could create the matching source code and write it to a file. Use compile:file to create a binary and load this with code:load_binary.
Advantage: Very fast matching
Disadvantage: overhead when pattern changes
Stuff the data from Array into ETS and use match specifications to get out the data
You might use fun2ms to help create the match specification. But fun2ms normally is used as a parse transfor during compile time. There is also a mode used by the shell that can be made to work from strings with the help of the parser probably. For details see ms_transform
There might also be some way to use qlc but I didn't look into this in detail.
In any case be careful to sanitize your matching data if it comes from untrusted sources!

Design pattern? Function iterating through a list in search of the first {success} result

I've got a coding problem in Erlang that is probably a common design pattern, but I can't find any info on how to resolve it.
I've got a list L. I want to apply a function f to every element in L, and have it run across all elements in L concurrently. Each call to f(Element) will either succeed or fail; in the majority of cases it will fail, but occasionally it will succeed for a specific Element within L.
If/when a f(Element) succeeds, I want to return "success" and terminate all invocations of f for other elements in L - the first "success" is all I'm interested in. On the other hand, if f(Element) fails for every element in L, then I want to return "fail".
As a trivial example, suppose L is a list of integers, and F returns {success} if an element in L is 3, or {fail} for any other value. I want to find as quickly as possible if there are any 3s in L; I don't care how many 3s there are, just whether at least one 3 exists or not. f could look like this:
f(Int) ->
case Int of
3 -> {success};
_ -> {fail}
end.
How can I iterate through a list of Ints to find out if the list contains at least one 3, and return as quickly as possible?
Surely this is a common functional design pattern, and I'm just not using the right search terms within Google...
There basically two different ways of doing this. Either write your own function which iterates over the list returning true or false depending on whether it finds a 3:
contains_3([3|_]) -> true;
contains_3([_|T]) -> contains_3(T);
contains_3([]) -> false.
The second is use an a already defined function to do the actual iteration until a test on the list elements is true and provide it with the test. lists:any returns true or false depending on whether the test succeeds for at least one element:
contains_3(List) -> lists:any(fun (E) -> E =:= 3 end, List).
will do the same thing. Which you choose is up to you. The second one would probably be closer to a design pattern but I feel that even if you use it you should have an idea of how it works internally. In this case it is trivial and very close to the explicit case.
It is a very common thing to do, but whether it would classify as a design pattern I don't know. It seems so basic and in a sense "trivial" that I would hesitate to call it a design pattern.
It has been a while since I did any erlang, so I'm not going to attempt to provide you with syntax, however erlang and the OTP have the solution waiting for you.
Spawn one process representing the function; have it iterate over the list, spawning off as many processes as you feel is appropriate to perform the per-element calculation efficiently.
Link every process to the function-process, and have the function process terminate after it returns the first result.
Let erlang/otp to clean up the rest of the processes.
As has already been answered your solution is to use lists:any/2.
Seeing that you want a concurrent version of it:
any(F, List) ->
Parent = self(),
Pid = spawn(fun() -> spawner(Parent, F, List) end),
receive {Pid, Result} -> Result
end,
Result.
spawner(Parent, F, List) ->
Spawner = self(),
S = spawn_link(fun() -> wait_for_result(Spawner, Parent, length(List)) end),
[spawn_link(fun() -> run(S, F) end) || X <- List],
receive after infinity -> ok end.
wait_for_result(Spawner, Parent, 0) ->
Parent ! {Spawner, false},
exit(have_result);
wait_for_result(Spawner, Parent, Children) ->
receive
true -> Parent ! {Spawner, true}, exit(have_result);
false -> wait_for_result(Spawner, Parent, Children -1)
end.
run(S, F) ->
case catch(F()) of
true -> S ! true;
_ -> S ! false
end.
Note that all the children (the "run" processes) will die when the "wait_for_children" process does an exit(have_result).
Completely untested... Ah, what the heck. I'll do an example:
4> play:any(fun(A) -> A == a end, [b,b,b,b,b,b,b,b]).
false
5> play:any(fun(A) -> A == a end, [b,b,b,b,b,b,a,b]).
true
There could still be bugs (and there probably are).
You might want to look at the plists module: http://code.google.com/p/plists/ Though I don't know if plists:any handles
(a) on the 1st {success} received, tell the other sub-processes to stop processing & exit ASAP

Resources