Erlang receive and after - erlang

p(M, C, B, I) when B =:= 32#3J ->
receive {_} -> a
after 27#5C ->
C ! { self(), { M, (B * 13#37) rem 35#7B, I }} end;
This is a part of code that expects input. I understand that it needs to look like (num,num,115,num) to pass the first part, but I don't understand what happens after that with the receive and the after part, can anybody explain?
I tried reading documentation of erlang and just couldn't understand that part of the code.

Here's what a receive expression looks like:
receive
Pattern1 [when GuardSeq1] ->
Body1;
...;
PatternN [when GuardSeqN] ->
BodyN
after
ExprT ->
BodyT
end
In erlang, a receive waits for a message to be sent to the process executing the receive. The message has to match a Pattern, in your case{_}, for the Body of the matching clause to execute (otherwise the receive waits indefinitely for a matching message). The term {_} matches any message that consists of a tuple, {}, with a single term inside the tuple, _. The notation _ matches anything, so the tuple could contain a nested tuple, a list, a string, a number...anything. So, the tuple {[1,a,"hello"]} would match, but not the tuple {a, 2}.
receive..after works exactly as receive, except that if no matching message has arrived within ExprT milliseconds, then BodyT is evaluated instead.
Here's your code:
p(M, C, B, I) when B =:= 32#3J ->
receive
{_} ->
a
after 27#5C ->
C ! { self(), { M, (B * 13#37) rem 35#7B, I }}
end;
Integers can be expressed in the format: base#value, and 27#5C is the same as the decimal number 147.
In this line:
C ! { self(), { M, (B * 13#37) rem 35#7B, I }}
the variable C will contain a process identifier, called a "pid" in erlang, and the operator ! sends a message to the process specified on the left hand side. The right had side is the message that is sent to process C, which in this case is a tuple containing the current process identifier, self(), as the first term, and another tuple as the second term.
Your code consists of a function that contains a receive statement, where the receive has a 147 millisecond timeout. After the receive line executes, if there's a message currently in the process mail box that matches {_} or if the process receives a message that matches {_} within the next 147 milliseconds, then the function returns the atom a, otherwise the function sends a message to another process and returns the same message--because ! returns the right hand side.
In erlang, you can easily start 1 million processes on a basic laptop, and the processes can send each other messages and the processes can receive particular messages and execute code in response to those messages.

Related

idiomatic process synchronisation in Erlang

I am looking at how to code "map reduce" type scenarios directly in erlang. As a toy example, imagine I want to decide which of several files is the biggest. Those files might be anywhere on the internet, so getting each one might take some time; so I'd like to gather them in parallel. Once I have them all, I can compare their sizes.
My assumed approach is as follows:
A 'main' process to co-ordinate the work and determine which is biggest;
A 'worker' process for each file, which fetches the file and returns the size to the main process.
Here's a clunky but functioning example (using local files only, but it shows the intent):
-module(cmp).
-export([cmp/2]).
cmp(Fname1, Fname2) ->
Pid1 = fsize(Fname1),
Pid2 = fsize(Fname2),
{Size1, Size2} = collect(Pid1, Pid2),
if
Size1 > Size2 ->
io:format("The first file is bigger~n");
Size2 > Size1 ->
io:format("The second file is bigger~n");
true ->
io:format("The files are the same size~n")
end.
fsize(Fname) ->
Pid = spawn(?MODULE, fsize, [self(), Fname]),
Pid.
fsize(Sender, Fname) ->
Size = filelib:file_size(Fname),
Sender ! {self(), Fname, Size}.
collect(Pid1, Pid2) ->
receive
{Pida, Fnamea, Sizea} ->
io:format("Pid: ~p, Fname: ~p, Size: ~p~n", [Pida, Fnamea, Sizea])
end,
receive
{Pidb, Fnameb, Sizeb} ->
io:format("Pid: ~p, Fname: ~p, Size: ~p~n", [Pidb, Fnameb, Sizeb])
end,
if
Pida =:= Pid1 -> {Sizea, Sizeb};
Pida =:= Pid2 -> {Sizeb, Sizea}
end.
Specific Questions
Is the approach idiomatic? i.e. hiving off each 'long running' task into a separate process, then collecting results back in a 'master'?
Is there a library to handle the synchronisation mechanics? Specifically, the collect function in the example above?
Thanks.
--
Note: I know the collect function in particular is clunky; it could be generalised by e.g. storing the pids in a list, and looping until all had completed.
In my opinion it's best to learn from an example, so I had a look at how they do that in otp/rpc and based on that I implemented a bit shorter/simpler version of the parallel eval call.
call(M, F, ArgL, Timeout) ->
ReplyTo = self(),
Keys = [spawn(fun() -> ReplyTo ! {self(), promise_reply, M:F(A)} end) || A <- ArgL],
Yield = fun(Key) ->
receive
{Key, promise_reply, {error, _R} = E} -> E;
{Key, promise_reply, {'EXIT', {error, _R} = E}} -> E;
{Key, promise_reply, {'EXIT', R}} -> {error, R};
{Key, promise_reply, R} -> R
after Timeout -> {error, timeout}
end
end,
[Yield(Key) || Key <- Keys].
I am not a MapReduce expert but I did had some experience using this 3rd party mapreduce module. So I will try to answer your question based on my current knowledge.
First, your input should be arranged as pairs of keys and values in order to properly use the mapreduce model. In general, your master process should first start workers processes (or nodes). Each worker receives a map function and a pair of key and value, lets name it {K1,V1}. It then executes the map function with the key and value and emits a new pair of key and value {K2,V2}. The master process collects the results and waits for all workers to finish their jobs. After all workers are done, the master starts the reduce part on the pairs {K2,List[V2]} that were emited by the workers. This part can be executed in parallel or not, it used to combine all the results into a single output. Note that the List[V2] is because there can be more then one value that was emited by the workers for a single K2 key.
From the 3rd party module I mentioned above:
%% Input = [{K1, V1}]
%% Map(K1, V1, Emit) -> Emit a stream of {K2,V2} tuples
%% Reduce(K2, List[V2], Emit) -> Emit a stream of {K2,V2} tuples
%% Returns a Map[K2,List[V2]]
If we look into Erlangs' lists functions, the map part is actually equal for doing lists:map/2 and the reduce part is in some way similar to lists:foldl/3 or lists:foldr/3 and the combination between them are: lists:mapfoldl/3, lists:mapfoldr/3.
If you are using this pattern of mapreduce using sets of keys and values, there is no need for special synchronization if that is what you mean. You just need to wait for all workers to finish their job.
I suggest you to go over the 3rd party module I mentioned above. Take also a look at this example. As you can see, the only things you need to define are the Map and Reduce functions.

Find the length of the longest substring

I see questions similar like this ones, but eventually, for different programming languages. I'm trying to solve this little problem:
Given a string, find the length of the longest substring without
repeating characters. For example, the longest substring without
repeating letters for abcabcbb is abc, which the length is 3. For
bbbbb the longest substring is b, with the length of 1.
I don't need the anwer to it but why what I have so far fails in the second iteration.
1> longest_substring:main("abcabcbb").
H: 97, T: "bcabcbb", Sub: []
Is 97 in []? false
H: 98, T: "cabcbb", Sub: 97
** exception error: no function clause matching string:chr(97,98,1) (string.erl, line 99)
in function longest_substring:process/2 (src/leetcode/algorithms/longest_substring.erl, line 28)
2>
This is the source code:
-module(longest_substring).
-export([main/1]).
-spec main(S :: string()) -> string().
%%%==================================================================
%%% Export
%%%==================================================================
main(S) -> process(S, "").
%%%==================================================================
%%% Internal
%%%==================================================================
process([], Sub) -> string:len(Sub);
process([H | T], Sub) ->
io:format("H: ~w, T: ~p, Sub: ~p~n", [H, T, Sub]),
Found = string:chr(Sub, H),
io:format("Is ~w in ~p? ~p~n", [H, Sub, Found =/= 0]),
% Don't know how to make this `if` thing better...
if
Found > 0 -> process(T, H);
_ -> process(T, string:concat(Sub, H))
end.
You have two places where you are treating character H as a string, both within the if:
if
Found > 0 -> process(T, H);
_ -> process(T, string:concat(Sub, H))
end.
Both appearances of H here need to be [H] instead, to form a string from the single character. (Also, your final clause in the if needs to use true, not an underscore — you should be getting a compiler error about this.)
Currently your solution returns a number, not a string. It also fails to remember any longer substring that might appear early in the string. To fix that, you need to remember the longest substring you've seen so far, which means you need another accumulator:
-module(longest_substring).
-export([main/1]).
-spec main(S :: string()) -> string().
main(S) -> process(S, {0,[]}, {0,[]}).
process([], {LL,Last}, {LG,_}) when LL > LG -> Last;
process([], _, {_,Long}) -> Long;
process([H | T], {LL,Last}=Sub, {LG,_}=Long) ->
case string:rchr(Last, H) of
0 ->
process(T, {LL+1,string:concat(Last,[H])}, Long);
N ->
NewLast = {1+LL-N,string:substr(Last,N+1)++[H]},
process(T, NewLast,
case LL > LG of
true ->
Sub;
false ->
Long
end)
end.
The main/1 function passes two accumulators to process/3, each of which is a pair of a length and a list. The first accumulator tracks the current substring, and the second tracks the longest substring seen so far.
In the last clause of process/3, we first check if H is in the current substring; if not, we add it to the current substring, increase its length by 1, and call process/3 again with the tail of the string. But if H is found in the current substring, we calculate the new current substring using the return value of string:rchr/2 to preserve the longest portion of the previous substring that we can (the original solution does not do this). We then check to see if the length of the current substring is greater than the current longest substring, and if so, we make it the longest substring, or if not we throw it away and keep the current longest substring, and then continue with the tail of the string. (Note that we could also make this check for greater or equal instead of greater; this would make our function return the last longest substring we find rather than the first.)
The first two clauses of process/3 handle the case where the input string has been fully processed. They just decide if the current substring is longer than the longest seen so far and return the longer of the two. (The alternative of using a greater or equal comparison applies here as well.)
for fun, I propose you to avoid complex search. In this solution I create a process for each element of the list holding: the element itself, the Pid of the next process/element in the list, and the Pid of the caller.
To initiate the search, I send to each process/element an empty list.
Each time a process/element receives a list, it checks if its stored element is a member of the received list. If yes, the list is send back to the caller, if not the element is prepend to the list and the new list is sent to the next process/element to continue the evaluation.
The caller process simply waits for as many returned messages as it has sent.
I have added a stop message and a special case for the last element of the list.
-module (longer).
-compile([export_all]).
char_proc(V,Next,Caller) ->
receive
stop -> ok;
Str ->
case lists:member(V,Str) of
true -> Caller ! Str;
false -> send(Next,Caller,[V|Str])
end,
char_proc(V,Next,Caller)
end.
send(noproc,Caller,Str) -> Caller ! Str;
send(Next,_,Str) -> Next ! Str.
find(Str) ->
Me = self(),
Pids = tl(lists:reverse(lists:foldl(fun(X,Acc) -> Pid = spawn(?MODULE,char_proc,[X,hd(Acc),Me]), [Pid|Acc] end,[noproc],Str))),
[X ! [] || X <- Pids],
R = receive_loop(0,[],length(Str)),
[X ! stop || X <- Pids],
R.
receive_loop(N,S,0) -> {N,S};
receive_loop(N,S,I) ->
receive
M when length(M) > N ->
receive_loop(length(M),M,I-1);
_ ->
receive_loop(N,S,I-1)
end.
tested in the shell:
1> c(longer).
{ok,longer}
2> longer:find("abcdad").
{4,"abcd"}
3> longer:find("abcdadtfrseqgepz").
{9,"adtfrseqg"}
4> longer:find("aaaaaaaaaaaa").
{1,"a"}
5> longer:find("abcdatfrseqgepz").
{11,"bcdatfrseqg"}
6>
Note there is no guarantee about witch sub-string will be returned if it exists several solutions, it is very easy to modify the code to return either the first sub-string or all of them.

Purpose of `receive after 0` (also known as Selective Receives)

From the Learn You Some Erlang for Great Good!
Another special case is when the timeout is at 0:
flush() ->
receive
_ -> flush()
after 0 ->
ok
end
.
When that happens, the Erlang VM will try and find a message that fits
one of the available patterns. In the case above, anything matches. As
long as there are messages, the flush/0 function will recursively call
itself until the mailbox is empty. Once this is done, the after 0 ->
ok part of the code is executed and the function returns.
I don't understand purpose of after 0. After reading above text I thought it was like after infinity (waiting forever) but I changed a little the flush function:
flush2() ->
receive
_ -> timer:sleep(1000), io:format("aa~n"), flush()
after 0 ->
okss
end
.
flush3() ->
receive
_ -> io:format("aa~n"), flush()
after 0 ->
okss
end
.
In the first function it waits 1 second and in the second function it doesn't wait.
In both cases it doesn't display a text (aa~n).
So it doesn't work as after infinity.
If block between the receive and the after are not executed then above 2 codes can be simplified to:
flush4() ->
okss
.
What I am missing?
ps. I am on the Erlang R16B03-1, and author of the book was, as fair I remember, was on the Erlang R13.
Every process has a 'mailbox' -- message queue. Messages can be fetched by receive. if there is no messages in the queue. after part specifies how much time 'receive will wait for them. So after 0 -- means process checking (by receive ) if any messages in the queue and if queue is empty immediately continue to next instructions.
It can be used for instance if we want periodically check if any messages here and to do something (hopefully helpful) if there is no messages.
Consider after 0 to be finally.
Consider the use of after 0 to process receives with a priority: http://learnyousomeerlang.com/more-on-multiprocessing#selective-receives
May this different look on things enlighten you.
You can play with the following shell command to understand the effect of the after command:
4> L = fun(G) ->
4> receive
4> stop -> ok;
4> M -> io:format("received ~p~n",[M]), G(G)
4> after 0 ->
4> io:format("no message~n")
4> end
4> end.
#Fun<erl_eval.6.80484245>
5> F = fun() -> timer:sleep(10000),
5> io:format("end of wait for messages, go to receive block~n"),
5> L(L)end.
#Fun<erl_eval.20.80484245>
6> spawn(F).
<0.46.0>
end of wait for messages, go to receive block
no message
7> P1 = spawn(F).
<0.52.0>
8> P1 ! hello.
hello
end of wait for messages, go to receive block
received hello
no message
9> P2 ! hello, P2 ! stop.
* 1: variable 'P2' is unbound
8> P2 = spawn(F).
<0.56.0>
9> P2 ! hello, P2 ! stop.
stop
end of wait for messages, go to receive block
received hello
10>
If you do not intend to use a nested receive, rather than using "after" part, I think a better approach is to use "Unexpected ->" variable to handle all unmatched messages.

novice question - should one use ifs or short functions when programming erlang?

I started studying Programming Erlang book in earnest recently, and I have a question.
Is the below proper approach to Erlang? This is (modified for brevity (no quit message), with logging removed after basic verification) solution to the ring problem from ch4. Processes exit after they have passed the message intended number of times; first process waits for the last message to reach it and exits.
Aside from general criticism of style and substance, can you please tell me if writing special 1-2 line functions like this a correct style, or if one should use if-s, cases-s, etc?
start_ring( 0, _, _ ) -> {error, badarg};
start_ring( _, 0, _ ) -> {error, badarg};
start_ring( M, N, Message ) ->
spawn( ring, run_ring, [M, N, Message, 0] ).
% last process that connects the ring
run_ring( M, 1, Message, Pid ) when is_pid(Pid) ->
loop_ring( M, Message, Pid, false );
% process in the middle
run_ring( M, N, Message, Pid ) when is_pid(Pid) ->
loop_ring( M, Message, spawn( ring, run_ring, [M, N-1, Message, Pid] ), false );
% first process - special case for one process
run_ring( M, 1, Message, _ ) ->
loop_ring( M, self() ! Message, self(), true );
% first process
run_ring( M, N, Message, _ ) ->
NextPid = spawn( ring, run_ring, [M, N-1, Message, self()] ),
loop_ring( M, NextPid ! Message, NextPid, true ).
loop_ring( 0, _, _, _ ) -> ok;
loop_ring( 1, Message, Next, true ) -> ok;
loop_ring( M, Message, Next, IsMaster ) ->
receive
Message -> loop_ring( M - 1, Next ! Message, Next, IsMaster )
end.
I think your style is very good and concise! Good work!
A few comments (a matter of personal taste):
Start ring can be rewritten as:
start_ring( M, N, Message ) when M < N, N > 0, M > 0 ->
spawn( ring, run_ring, [M, N, Message, 0] ).
This will crash with a function_clause error if used improperly. There's a good habit when dealing with error returns, that if the user can do something sensible with the error, return for example {error, Reason}, otherwise just crash. I think in this case it is safe to just crash, because any other input would be a bug in the program.
run_ring/4 + loop_ring/4: I don't like to use line breaks between functions with several clauses. That makes it harder to see where the function begins and ends. The comments can then be put inside the clause body instead of outside. It becomes much easier to identify the function headers now (and to see the function as one unit):
run_ring(M, 1, Message, Pid) when is_pid(Pid) ->
% last process that connects the ring
loop_ring(M, Message, Pid, false);
run_ring(M, N, Message, Pid) when is_pid(Pid) ->
% process in the middle
loop_ring(M, Message, spawn(ring, run_ring, [M, N-1, Message, Pid]), false);
run_ring(M, 1, Message, _) ->
% first process - special case for one process
loop_ring(M, self() ! Message, self(), true);
run_ring(M, N, Message, _) ->
% first process
NextPid = spawn(ring, run_ring, [M, N-1, Message, self()]),
loop_ring(M, NextPid ! Message, NextPid, true).
I personally dislike spaces inside parentheses (as a I said, personal taste). :-) Makes the code more "fluffy".
Use spawn_link/3 instead of spawn/3 unless you know that you don't want it. It makes it much easier to detect bugs etc when you're developing your program.
The second clause of loop_ring/4 emits compiler warnings. Use _Message and _Next instead (use them for the first clause as well, it's bonus documentation!)
According to the Erlang best practices, one should avoid if and case nested more than twice:
Nested code is code containing
case/if/receive statements within
other case/if/receive statements. It
is bad programming style to write
deeply nested code - the code has a
tendency to drift across the page to
the right and soon becomes unreadable.
Try to limit most of your code to a
maximum of two levels of indentation.
This can be achieved by dividing the
code into shorter functions.
Apart from that, I guess it is a matter of taste to use an if/case or simple pattern matching. Personally, I prefer using pattern matching rather than if or cases. So, you're doing it the right way if you ask me.
Regarding if and case, usually you can rewrite the former into the latter. Someone says:
"always use the case, the if construct
is kinda broken".
Well, the two constructs work very differently. The expression evaluated in the if construct is a guard and it has a lot of limits - due to the fact you can't have side effects into guards, being evaluated irrespectively of the branch taken -. The case construct doesn't have this "limitation". You can use any expression there, whose result will be matched against the patterns forming the case.
Possible duplicate:
Erlang style - case vs function pattern matching

ERLANG - Pattern Matching

I have a variable:
Data = [[<<>>,
[<<"10">>,<<"171">>],
[<<"112">>,<<"Gen20267">>],
[<<"52">>,<<"20100812-06:32:30.687">>]]
I am trying to pattern match for two specific cases..
One where anything that resembles the outside structure - simply []
Anything inside goes I have tried [ _ ] but no go?
The Second, for a specific pattern inside, like when I see a <<"10">> or <<"112">> or <<"52">> then I am going to take the right side which is the actual data into an atom.
Basically the <<"10">> or <<"112">> or <<"52">> are the fields, the right side the data.
I have tried statements like [<<"10">>, _ ] still no go
Here is the rest of the code:
dataReceived(Message) ->
receive
{start} ->
ok;
[ _ ] -> %%No go
io:format("Reply 1 = ~p~n", [Message]);
[<<"10">>, _ ] -> %%No go
io:format("Reply 1 = ~p~n", [Message])
end.
As a note the Message is not sent as a tuple it is exactly like Data =
Can anyone lead me in the right direction?
Thanks and Goodnight!
-B
UPDATE
Ok now I think Im getting warmer, I have to pattern match whatever comes in.
So if I had say
Message = = [[<<>>],
[<<"10">>,<<"171">>],
[<<"112">>,<<"Gen20267">>],
[<<"52">>,<<"20100812-06:32:30.687">>]]
And I was looking to pattern match the field <<"112">>
Such as the 112 is always going to say 112, but the Gen2067 can change whenever to whatever.. its the data, it will be stored in a variable.
loop() ->
receive
[_,[<<"112">>, Data], _] when is_list(X) -> %% Match a list inside another.
?DEBUG("Got a list ~p~n", [X]),
loop();
_Other ->
?DEBUG("I don't understand ~p~n", [_Other]),
loop()
end.
I feel im close, but not 100%
-B
Update OP is trying to pass an argument to the function and not send messages.
As the name indicates the receive block is used to receive and process messages sent to a process. When you call dataReceived with an argument it proceeds to wait for messages. As no messages are sent it will continue to wait endlessly. Given the current code if you want it to do something then you'll have to spawn the function, get the process ID and then send a message to the process ID.
You probably need a function where the argument is pattern matched and not messages.
Something like this:
dataReceived([Message]) when is_list(Message) ->
io:format("Got a list as arg ~p~n", [Message]);
dataReceived(_Other) ->
io:format("Unknown arg ~p~n", [_Other]).
On a side note your third pattern [X] when is_list(X) will never match as the second pattern is a superset of the third. Anything that matches [X] when is_list(X) will always match [X] and therefore your third match clause will never get triggered.
Original Answer
I am not sure I understand your question. Are you trying to send a message to the function or are you passing it an argument?
This is a partial answer about how to match a list of lists in case you are sending a message.
-module(mtest).
-export([run/0]).
-ifdef(debug).
-define(DEBUG(Format, Args), io:format(Format, Args)).
-else.
-define(DEBUG(Format, Args), void).
-endif.
loop() ->
receive
[X] when is_list(X) -> %% Match a list inside another.
?DEBUG("Got a list ~p~n", [X]),
loop();
_Other ->
?DEBUG("I don't understand ~p~n", [_Other]),
loop()
end.
Take a look at the first clause in the receive block. [X] when is_list(X) will bind the inner list to the name X. I tested it with the value of Data you provided and it worked.
%% From the shell.
1> c(mtest, {d, debug}).
{ok,mtest}
2> Pid = mtest:run().
<0.40.0>
3> Data = [[<<>>, [<<"10">>,<<"171">>], [<<"112">>,<<"Gen20267">>], [<<"52">>,<<"20100812-06:32:30.687">>]]].
[[<<>>,
[<<"10">>,<<"171">>],
[<<"112">>,<<"Gen20267">>],
[<<"52">>,<<"20100812-06:32:30.687">>]]]
4> Pid ! Data.
[[<<>>,
[<<"10">>,<<"171">>],
[<<"112">>,<<"Gen20267">>],
[<<"52">>,<<"20100812-06:32:30.687">>]]]
Got a list [<<>>,
[<<"10">>,<<"171">>],
[<<"112">>,<<"Gen20267">>],
[<<"52">>,<<"20100812-06:32:30.687">>]]
5>

Resources