I'm trying to do something like this:
test(Tester) ->
case Tester of
start -> X = 4
ok -> X = X + 5,
Y = X/5;
terminate -> Y
end.
But not exactly this. I Know it can be achieved with tail or simple recursion.
normally X and Y are unbound.
Is there any way to communicate between these cases without the usage of erlang global variables?
Erlang is a functional language it means we don't communicate between different parts of the code or store values in variables without purpose, we just compute to return value (and sometimes make some sideeffect). If we have common computation in different branches of code we can simply place it in common function.
test(Tester) ->
case Tester of
start -> 4;
ok -> computeY();
terminate -> computeY()
end.
computeY() ->
X = 4 + 5,
X/5.
If you need to access a variable in any clause body of a case statement, You have to assign it before case statement or sometimes you can assign it in case clause pattern:
test(Arg) ->
Size = get_size(Arg), % I will use 'Size' in each clause body
case Arg of
#{foo := Foo} -> % if Arg is an Erlang map and has key 'foo'
% I can use 'Foo' only here:
io:format("It's a map with size ~p and has key foo with value ~p\n", [Size, Foo]);
[Baz|_] -> % if Arg is an Erlang list with at least one element
% I can use 'Baz' only here and for example i can NOT use 'Foo' here:
io:format("It's a list with size ~p and its first element is ~p\n", [Size, Baz]);
_ ->
io:format("Unwanted argument ~p with ~p size\n", [Arg, Size])
end.
get_size(X) when is_map(X) -> map_size(X);
get_size(X) when is_list(X) -> length(X);
get_size(_) -> unknown.
I put above code in an Erlang fun named Test to using it in shell without need to compile a module file:
1> Test([5,4,3,2,1]).
It's a list with size 5 and its first element is 5
ok
2> Test(#{key => value, foo => ':)'}).
It's a map with size 2 and has key foo with value ':)'
ok
3> Test([]).
Unwanted argument [] with 0 size
ok
4> Test(#{key => value}).
Unwanted argument #{key => value} with 1 size
ok
5> Test(13).
Unwanted argument 13 with unknown size
ok
If you are curious about variable binding in case, I recommend to read this article
To do this in Erlang you would start (spawn) a process that would hold X in memory as well as the PID (process id) of the process it is supposed to reply to unless you want to pass it a different PID every time along with start/ok/terminate. Processes in Erlang have their own memory, state or loopData. After you spawn a process that knows how to handle specific messages, you pass it the messages and it replies by sending a message back.
start_test() ->
TestPID = spawn(?MODULE, test, [self()]),
TestPID ! start,
receive
X -> io:format("X is: ~p~n",[X]
end,
TestPID ! ok,
receive
{X,Y} -> io:format("X is: ~p, Y is: ~p~n",[X, Y]
end,
TestPID ! terminate,
receive
Y -> io:format("Y is: ~p~n",[Y]
end.
test(PID) ->
receive
start -> PID ! 4,
test(4, undefined, PID);
terminate -> undefined
end.
test(X, Y, PID) ->
receive
ok -> PID ! {X+5, (X+5)/5},
test(X+5, (X+5)/5, PID);
terminate -> PID ! Y
end.
Don't forget to create a module and export start_test/0 and test/1
If you run start_test() you should get an output of
X is: 4
X is: 9, Y is: 1.8
Y is: 1.8
Related
I am looking at how to code "map reduce" type scenarios directly in erlang. As a toy example, imagine I want to decide which of several files is the biggest. Those files might be anywhere on the internet, so getting each one might take some time; so I'd like to gather them in parallel. Once I have them all, I can compare their sizes.
My assumed approach is as follows:
A 'main' process to co-ordinate the work and determine which is biggest;
A 'worker' process for each file, which fetches the file and returns the size to the main process.
Here's a clunky but functioning example (using local files only, but it shows the intent):
-module(cmp).
-export([cmp/2]).
cmp(Fname1, Fname2) ->
Pid1 = fsize(Fname1),
Pid2 = fsize(Fname2),
{Size1, Size2} = collect(Pid1, Pid2),
if
Size1 > Size2 ->
io:format("The first file is bigger~n");
Size2 > Size1 ->
io:format("The second file is bigger~n");
true ->
io:format("The files are the same size~n")
end.
fsize(Fname) ->
Pid = spawn(?MODULE, fsize, [self(), Fname]),
Pid.
fsize(Sender, Fname) ->
Size = filelib:file_size(Fname),
Sender ! {self(), Fname, Size}.
collect(Pid1, Pid2) ->
receive
{Pida, Fnamea, Sizea} ->
io:format("Pid: ~p, Fname: ~p, Size: ~p~n", [Pida, Fnamea, Sizea])
end,
receive
{Pidb, Fnameb, Sizeb} ->
io:format("Pid: ~p, Fname: ~p, Size: ~p~n", [Pidb, Fnameb, Sizeb])
end,
if
Pida =:= Pid1 -> {Sizea, Sizeb};
Pida =:= Pid2 -> {Sizeb, Sizea}
end.
Specific Questions
Is the approach idiomatic? i.e. hiving off each 'long running' task into a separate process, then collecting results back in a 'master'?
Is there a library to handle the synchronisation mechanics? Specifically, the collect function in the example above?
Thanks.
--
Note: I know the collect function in particular is clunky; it could be generalised by e.g. storing the pids in a list, and looping until all had completed.
In my opinion it's best to learn from an example, so I had a look at how they do that in otp/rpc and based on that I implemented a bit shorter/simpler version of the parallel eval call.
call(M, F, ArgL, Timeout) ->
ReplyTo = self(),
Keys = [spawn(fun() -> ReplyTo ! {self(), promise_reply, M:F(A)} end) || A <- ArgL],
Yield = fun(Key) ->
receive
{Key, promise_reply, {error, _R} = E} -> E;
{Key, promise_reply, {'EXIT', {error, _R} = E}} -> E;
{Key, promise_reply, {'EXIT', R}} -> {error, R};
{Key, promise_reply, R} -> R
after Timeout -> {error, timeout}
end
end,
[Yield(Key) || Key <- Keys].
I am not a MapReduce expert but I did had some experience using this 3rd party mapreduce module. So I will try to answer your question based on my current knowledge.
First, your input should be arranged as pairs of keys and values in order to properly use the mapreduce model. In general, your master process should first start workers processes (or nodes). Each worker receives a map function and a pair of key and value, lets name it {K1,V1}. It then executes the map function with the key and value and emits a new pair of key and value {K2,V2}. The master process collects the results and waits for all workers to finish their jobs. After all workers are done, the master starts the reduce part on the pairs {K2,List[V2]} that were emited by the workers. This part can be executed in parallel or not, it used to combine all the results into a single output. Note that the List[V2] is because there can be more then one value that was emited by the workers for a single K2 key.
From the 3rd party module I mentioned above:
%% Input = [{K1, V1}]
%% Map(K1, V1, Emit) -> Emit a stream of {K2,V2} tuples
%% Reduce(K2, List[V2], Emit) -> Emit a stream of {K2,V2} tuples
%% Returns a Map[K2,List[V2]]
If we look into Erlangs' lists functions, the map part is actually equal for doing lists:map/2 and the reduce part is in some way similar to lists:foldl/3 or lists:foldr/3 and the combination between them are: lists:mapfoldl/3, lists:mapfoldr/3.
If you are using this pattern of mapreduce using sets of keys and values, there is no need for special synchronization if that is what you mean. You just need to wait for all workers to finish their job.
I suggest you to go over the 3rd party module I mentioned above. Take also a look at this example. As you can see, the only things you need to define are the Map and Reduce functions.
I see questions similar like this ones, but eventually, for different programming languages. I'm trying to solve this little problem:
Given a string, find the length of the longest substring without
repeating characters. For example, the longest substring without
repeating letters for abcabcbb is abc, which the length is 3. For
bbbbb the longest substring is b, with the length of 1.
I don't need the anwer to it but why what I have so far fails in the second iteration.
1> longest_substring:main("abcabcbb").
H: 97, T: "bcabcbb", Sub: []
Is 97 in []? false
H: 98, T: "cabcbb", Sub: 97
** exception error: no function clause matching string:chr(97,98,1) (string.erl, line 99)
in function longest_substring:process/2 (src/leetcode/algorithms/longest_substring.erl, line 28)
2>
This is the source code:
-module(longest_substring).
-export([main/1]).
-spec main(S :: string()) -> string().
%%%==================================================================
%%% Export
%%%==================================================================
main(S) -> process(S, "").
%%%==================================================================
%%% Internal
%%%==================================================================
process([], Sub) -> string:len(Sub);
process([H | T], Sub) ->
io:format("H: ~w, T: ~p, Sub: ~p~n", [H, T, Sub]),
Found = string:chr(Sub, H),
io:format("Is ~w in ~p? ~p~n", [H, Sub, Found =/= 0]),
% Don't know how to make this `if` thing better...
if
Found > 0 -> process(T, H);
_ -> process(T, string:concat(Sub, H))
end.
You have two places where you are treating character H as a string, both within the if:
if
Found > 0 -> process(T, H);
_ -> process(T, string:concat(Sub, H))
end.
Both appearances of H here need to be [H] instead, to form a string from the single character. (Also, your final clause in the if needs to use true, not an underscore — you should be getting a compiler error about this.)
Currently your solution returns a number, not a string. It also fails to remember any longer substring that might appear early in the string. To fix that, you need to remember the longest substring you've seen so far, which means you need another accumulator:
-module(longest_substring).
-export([main/1]).
-spec main(S :: string()) -> string().
main(S) -> process(S, {0,[]}, {0,[]}).
process([], {LL,Last}, {LG,_}) when LL > LG -> Last;
process([], _, {_,Long}) -> Long;
process([H | T], {LL,Last}=Sub, {LG,_}=Long) ->
case string:rchr(Last, H) of
0 ->
process(T, {LL+1,string:concat(Last,[H])}, Long);
N ->
NewLast = {1+LL-N,string:substr(Last,N+1)++[H]},
process(T, NewLast,
case LL > LG of
true ->
Sub;
false ->
Long
end)
end.
The main/1 function passes two accumulators to process/3, each of which is a pair of a length and a list. The first accumulator tracks the current substring, and the second tracks the longest substring seen so far.
In the last clause of process/3, we first check if H is in the current substring; if not, we add it to the current substring, increase its length by 1, and call process/3 again with the tail of the string. But if H is found in the current substring, we calculate the new current substring using the return value of string:rchr/2 to preserve the longest portion of the previous substring that we can (the original solution does not do this). We then check to see if the length of the current substring is greater than the current longest substring, and if so, we make it the longest substring, or if not we throw it away and keep the current longest substring, and then continue with the tail of the string. (Note that we could also make this check for greater or equal instead of greater; this would make our function return the last longest substring we find rather than the first.)
The first two clauses of process/3 handle the case where the input string has been fully processed. They just decide if the current substring is longer than the longest seen so far and return the longer of the two. (The alternative of using a greater or equal comparison applies here as well.)
for fun, I propose you to avoid complex search. In this solution I create a process for each element of the list holding: the element itself, the Pid of the next process/element in the list, and the Pid of the caller.
To initiate the search, I send to each process/element an empty list.
Each time a process/element receives a list, it checks if its stored element is a member of the received list. If yes, the list is send back to the caller, if not the element is prepend to the list and the new list is sent to the next process/element to continue the evaluation.
The caller process simply waits for as many returned messages as it has sent.
I have added a stop message and a special case for the last element of the list.
-module (longer).
-compile([export_all]).
char_proc(V,Next,Caller) ->
receive
stop -> ok;
Str ->
case lists:member(V,Str) of
true -> Caller ! Str;
false -> send(Next,Caller,[V|Str])
end,
char_proc(V,Next,Caller)
end.
send(noproc,Caller,Str) -> Caller ! Str;
send(Next,_,Str) -> Next ! Str.
find(Str) ->
Me = self(),
Pids = tl(lists:reverse(lists:foldl(fun(X,Acc) -> Pid = spawn(?MODULE,char_proc,[X,hd(Acc),Me]), [Pid|Acc] end,[noproc],Str))),
[X ! [] || X <- Pids],
R = receive_loop(0,[],length(Str)),
[X ! stop || X <- Pids],
R.
receive_loop(N,S,0) -> {N,S};
receive_loop(N,S,I) ->
receive
M when length(M) > N ->
receive_loop(length(M),M,I-1);
_ ->
receive_loop(N,S,I-1)
end.
tested in the shell:
1> c(longer).
{ok,longer}
2> longer:find("abcdad").
{4,"abcd"}
3> longer:find("abcdadtfrseqgepz").
{9,"adtfrseqg"}
4> longer:find("aaaaaaaaaaaa").
{1,"a"}
5> longer:find("abcdatfrseqgepz").
{11,"bcdatfrseqg"}
6>
Note there is no guarantee about witch sub-string will be returned if it exists several solutions, it is very easy to modify the code to return either the first sub-string or all of them.
I just had a simple question and I can't seem to find an answer to it. Basically my question is if I have a function recording state (recursive) and I send multiple messages to it, will it keep going through the receive block until it no longer has messages in its "mailbox"?
state(Fridge) ->
receive
Pat1 ->
io:format("ok"),
state(S);
Pat2 ->
io:format("not ok"),
state(S)
end.
So if I'd send to this process 3 messages (Pat1, Pat2, Pat1) using "!" and its not able to go into its loop before receiving messages will it still print out the following?
1> "ok"
2> "not ok"
3> "ok"
Sorry if this isn't very clearly put, by simplifying the question it might make it hard to picture what I am asking.
Your question isn't clear but you seem to be asking whether the process will receive the three messages even if they're sent before the target process has called receive — if that's the question, the answer is yes. When you send a message to a process, it goes into that process's message queue until the process calls receive to remove it from the message queue and deal with it.
If you call erlang:process_info(Pid, messages) where Pid is the receiver's process id, you can see what messages are in its queue. You might try this from the Erlang shell.
As an extreme example of message queueing, under some heavy load conditions it can be a source of out-of-memory problems if a receiver can't keep up with a fast sender. Under these conditions, a receiver's message queue might grow without bound until the system runs out of memory.
Does this answer your question?
1> OUT = fun(X) -> io:format(">>> ~p~n", [X]) end.
#Fun<erl_eval.6.54118792>
2> F = fun X() -> receive foo -> OUT(foo), X(); bar -> OUT(bar), X() end end.
#Fun<erl_eval.44.54118792>
3> P = spawn(F).
<0.38.0>
4> [ P ! X || X <- [foo, bar, foo]].
>>> foo
[foo,bar,foo]
>>> bar
>>> foo
A message arrives in the mailbox and then receive patterns are applied to the message like the function clause or case statement. But unlike those, if none of them match, next message is processed and the previous one left in the message box untouched. Other receive clause starts always from beginning of message queue.
1> OUT = fun(X) -> io:format(">>> ~p~n", [X]) end.
#Fun<erl_eval.6.54118792>
2> F = fun() -> receive start -> (fun X() -> receive foo -> OUT(foo), X(); bar -> OUT(bar), X() end end)() end end.
#Fun<erl_eval.20.54118792>
3> P = spawn(F).
<0.38.0>
4> [ P ! X || X <- [foo, bar, foo]].
[foo,bar,foo]
5> P! start.
>>> foo
start
>>> bar
>>> foo
Note that foo, bar, foo is in the queue in the first receive but it is not processed. When start arrives (last in the queue) second receive starts process foo and bar messages.
-module(wy).
-compile(export_all).
state() ->
timer:sleep(2000),
io:format("~p~n", [erlang:process_info(self(), messages)]),
receive
pat1 ->
io:format("ok~n"),
state();
pat2 ->
io:format("not ok~n"),
state()
end.
main() ->
Pid = spawn(fun state/0),
[ Pid ! X || X <- [pat1, pat2, pat1]].
You can run this code, from the output you can understand everything.
**Note:**when you send message to a process, the sent messages will be stored in mail box of the process.
6> wy:main().
[pat1,pat2,pat1]
7> {messages,[pat1,pat2,pat1]}
7> ok
7> {messages,[pat2,pat1]}
7> not ok
7> {messages,[pat1]}
7> ok
7> {messages,[]}
7>
One comments for your code:
receive
Pat1 ->
io:format("ok"),
state(S);
Pat2 ->
io:format("not ok"),
state(S)
end.
the second clause will not match forever.
I've been making a chat application in Erlang for a school project, but I've run into an issue. I'm trying to make my program concurrent and in order to do so I start a new thread for each message a channel is sending. I do this using lists:foreach, but I want to make sure that I don't message the person who typed in the channel.
request(State, {message, {UserID,UserPID}, Token}) ->
case catch lists:member({UserID,UserPID}, State#channel.users) of
false ->
{{error, user_not_joined}, State};
true ->
spawn( fun() ->
ListOfUsers = State#channel.users,
UserPIDs = lists:map(fun ({_, V}) -> V end, ListOfUsers),
%spawn( fun() ->end)
lists:foreach(
fun(UserPID) -> ok end,
fun(PID) ->
spawn( fun() -> genserver:request(PID, {message_from_server, State#channel.name, UserID, Token}) end)
end, UserPIDs) end),
{ok, State}
end;
What I pretty much want to do is to match with the UserPID inside the foreach. As of now I only get warnings:
channel.erl:39: Warning: variable 'UserPID' is unused
channel.erl:39: Warning: variable 'UserPID' shadowed in 'fun'
Line 3 is fun(UserPID) -> ok end,
Cheers
The answer by legoscia is fine, but I'd add that often list comprehension is simpler to use and simpler to read than lists:foreach. Note that list comprehension is able to ignore values based on clauses. Consider the following example:
-module(filter).
-export([do/0]).
do() ->
Values = lists:seq(1,10),
IgnoreThisValue = 5,
%% print all values unequal to IgnoreThisValue
[io:format("Value: ~b~n", [Value]) ||
Value <- Values, Value =/= IgnoreThisValue],
ok.
Run it in the shell:
1> make:all([load]).
Recompile: filter
up_to_date
2> filter:do().
Value: 1
Value: 2
Value: 3
Value: 4
Value: 6
Value: 7
Value: 8
Value: 9
Value: 10
A side note on your code: Why do you spawn a thread per process? I assume that you are using the behaviour gen_server (correct me if I am wrong). If so, you should consider using the cast function to simply send a message. As you do not check the result of genserver:request/2, this might be a viable option which saves you a lot of processes.
Since the function argument shadows the existing variable, you need to use a guard for that:
fun(PID) when PID =:= UserPID -> ok end
From the Learn You Some Erlang for Great Good!
Another special case is when the timeout is at 0:
flush() ->
receive
_ -> flush()
after 0 ->
ok
end
.
When that happens, the Erlang VM will try and find a message that fits
one of the available patterns. In the case above, anything matches. As
long as there are messages, the flush/0 function will recursively call
itself until the mailbox is empty. Once this is done, the after 0 ->
ok part of the code is executed and the function returns.
I don't understand purpose of after 0. After reading above text I thought it was like after infinity (waiting forever) but I changed a little the flush function:
flush2() ->
receive
_ -> timer:sleep(1000), io:format("aa~n"), flush()
after 0 ->
okss
end
.
flush3() ->
receive
_ -> io:format("aa~n"), flush()
after 0 ->
okss
end
.
In the first function it waits 1 second and in the second function it doesn't wait.
In both cases it doesn't display a text (aa~n).
So it doesn't work as after infinity.
If block between the receive and the after are not executed then above 2 codes can be simplified to:
flush4() ->
okss
.
What I am missing?
ps. I am on the Erlang R16B03-1, and author of the book was, as fair I remember, was on the Erlang R13.
Every process has a 'mailbox' -- message queue. Messages can be fetched by receive. if there is no messages in the queue. after part specifies how much time 'receive will wait for them. So after 0 -- means process checking (by receive ) if any messages in the queue and if queue is empty immediately continue to next instructions.
It can be used for instance if we want periodically check if any messages here and to do something (hopefully helpful) if there is no messages.
Consider after 0 to be finally.
Consider the use of after 0 to process receives with a priority: http://learnyousomeerlang.com/more-on-multiprocessing#selective-receives
May this different look on things enlighten you.
You can play with the following shell command to understand the effect of the after command:
4> L = fun(G) ->
4> receive
4> stop -> ok;
4> M -> io:format("received ~p~n",[M]), G(G)
4> after 0 ->
4> io:format("no message~n")
4> end
4> end.
#Fun<erl_eval.6.80484245>
5> F = fun() -> timer:sleep(10000),
5> io:format("end of wait for messages, go to receive block~n"),
5> L(L)end.
#Fun<erl_eval.20.80484245>
6> spawn(F).
<0.46.0>
end of wait for messages, go to receive block
no message
7> P1 = spawn(F).
<0.52.0>
8> P1 ! hello.
hello
end of wait for messages, go to receive block
received hello
no message
9> P2 ! hello, P2 ! stop.
* 1: variable 'P2' is unbound
8> P2 = spawn(F).
<0.56.0>
9> P2 ! hello, P2 ! stop.
stop
end of wait for messages, go to receive block
received hello
10>
If you do not intend to use a nested receive, rather than using "after" part, I think a better approach is to use "Unexpected ->" variable to handle all unmatched messages.