For the following fragment:
outer_func(State) ->
spawn(fun()-> do_something(State) end).
Will State be shared or deep-copied to the spawned process heap?
It will be deep copied. Here's a simple demo:
1> State = lists:seq(1, 1000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
2> DoSomething = fun(State) -> io:format("~p~n", [process_info(self(), memory)]) end.
3> spawn(fun() -> DoSomething(State) end), spawn(fun() -> DoSomething(State) end), spawn(fun() -> DoSomething(State) end).
{memory,16583520}
{memory,16583520}
{memory,16583520}
In contrast to that, here's the output when the state is a large binary which is never "deep" copied when shared with multiple processes:
1> State = binary:copy(<<"a">>, 50000000).
<<"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"...>>
2> DoSomething = fun(State) -> io:format("~p~n", [process_info(self(), memory)]) end.
3> spawn(fun() -> DoSomething(State) end), spawn(fun() -> DoSomething(State) end), spawn(fun() -> DoSomething(State) end).
{memory,8744}
{memory,8744}
{memory,8744}
So a process with a list of integers from 1 to 1 million used about 16MB of memory while the one with a large binary used 8KB (the binary should actually be a negligible part of that).
Related
I am learning the basics of functional programming and Erlang, and I've implemented three versions of the factorial function: using recursion with guards, using recursion with pattern matching, and using tail recursion.
I am trying to compare the performance of each factorial implementation (Erlang/OTP 22 [erts-10.4.1]):
%% Simple factorial code:
fac(N) when N == 0 -> 1;
fac(N) when N > 0 -> N * fac(N - 1).
%% Using pattern matching:
fac_pattern_matching(0) -> 1;
fac_pattern_matching(N) when N > 0 -> N * fac_pattern_matching(N - 1).
%% Using tail recursion (and pattern matching):
tail_fac(N) -> tail_fac(N, 1).
tail_fac(0, Acc) -> Acc;
tail_fac(N, Acc) when N > 0 -> tail_fac(N - 1, N * Acc).
Timer helper:
-define(PRECISION, microsecond).
execution_time(M, F, A, D) ->
StartTime = erlang:system_time(?PRECISION),
Result = apply(M, F, A),
EndTime = erlang:system_time(?PRECISION),
io:format("Execution took ~p ~ps~n", [EndTime - StartTime, ?PRECISION]),
if
D =:= true -> io:format("Result is ~p~n", [Result]);
true -> ok
end
.
Execution results:
Recursive version:
3> mytimer:execution_time(factorial, fac, [1000000], false).
Execution took 1253949667 microseconds
ok
Recursive with pattern matching version:
4> mytimer:execution_time(factorial, fac_pattern_matching, [1000000], false).
Execution took 1288239853 microseconds
ok
Tail recursive version:
5> mytimer:execution_time(factorial, tail_fac, [1000000], false).
Execution took 1405612434 microseconds
ok
I was expecting tail recursion version to perform better than the other two but, to my surprise it is less performant. These results are the exact opposite of what I was expecting.
Why?
The problem is in function which you choose. Factorial is a function which grows very fast. Erlang has implemented big integer arithmetics, so it will not overflow. You are effectively measuring how good is underlying big integer implementation. 1000000! is a huge number. It is 8.26×10^5565708 which is like 5.6MB long written as a decadic number. There is a difference between your fac/1 and tail_fac/1 how fast they reach big numbers where big integer implementation kicks in and how fast the number grows. In you fac/1 implementation you are effectively computing 1*2*3*4*...*N. In your tail_fac/1 implementation you are computing N*(N-1)*(N-2)*(N-3)*...*1. Do you see the issue there? You can write tail call implementation in a different way:
tail_fac2(N) when is_integer(N), N > 0 ->
tail_fac2(N, 0, 1).
tail_fac2(X, X, Acc) -> Acc;
tail_fac2(N, X, Acc) ->
Y = X + 1,
tail_fac2(N, Y, Y*Acc).
It will work much better. I'm not patient as you are so I will measure a little bit smaller numbers but the new fact:tail_fac2/1 shoudl outperform fact:fac/1 every single time:
1> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7743768
2> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7629604
3> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7651739
4> element(1, timer:tc(fun()-> fact:tail_fac(100000) end)).
7229662
5> element(1, timer:tc(fun()-> fact:tail_fac(100000) end)).
7104056
6> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6491195
7> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6506565
8> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6519624
As you can see fact:tail_fac2/1 for N = 100000 takes 6.5s, fact:tail_fac/1 takes 7.2s and fact:fac/1 takes 7.6s. Even faster growth doesn't overturn tail call benefit so tail call version is faster than body recursive one there is clearly seen that slower growth of accumulator in fact:tail_fac2/1 show its impact.
If you choose a different function for tail call optimization testing you can see the impact of tail call optimization more clearly. For example sum:
sum(0) -> 0;
sum(N) when N > 0 -> N + sum(N-1).
tail_sum(N) when is_integer(N), N >= 0 ->
tail_sum(N, 0).
tail_sum(0, Acc) -> Acc;
tail_sum(N, Acc) -> tail_sum(N-1, N+Acc).
And speed is:
1> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
970749
2> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
126288
3> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
113115
4> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
104371
5> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
125857
6> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
92282
7> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
92634
8> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
68047
9> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
87748
10> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
94233
As you can see, there we can easily use N=10000000 and it works pretty fast. Anyway, body recursive function is significantly slower 110ms vs 85ms. You can notice the first run of fact:sum/1 took 9x longer than the rest of runs. It is because of body recursive function consuming a stack. You will not see such effect when you use a tail recursive counterpart. (Try it.) You can see the difference if you run each measurement in a separate process.
1> F = fun(G, N) -> spawn(fun() -> {T, _} = timer:tc(fun()-> fact:G(N) end), io:format("~p took ~bus and ~p heap~n", [G, T, element(2, erlang:process_info(self(), heap_size))]) end) end.
#Fun<erl_eval.13.91303403>
2> F(tail_sum, 10000000).
<0.88.0>
tail_sum took 70065us and 987 heap
3> F(tail_sum, 10000000).
<0.90.0>
tail_sum took 65346us and 987 heap
4> F(tail_sum, 10000000).
<0.92.0>
tail_sum took 65628us and 987 heap
5> F(tail_sum, 10000000).
<0.94.0>
tail_sum took 69384us and 987 heap
6> F(tail_sum, 10000000).
<0.96.0>
tail_sum took 68606us and 987 heap
7> F(sum, 10000000).
<0.98.0>
sum took 954783us and 22177879 heap
8> F(sum, 10000000).
<0.100.0>
sum took 931335us and 22177879 heap
9> F(sum, 10000000).
<0.102.0>
sum took 934536us and 22177879 heap
10> F(sum, 10000000).
<0.104.0>
sum took 945380us and 22177879 heap
11> F(sum, 10000000).
<0.106.0>
sum took 921855us and 22177879 heap
Why receive expression is sometimes called selective receive?
What is the "save queue"?
How the after section works?
There is a special "save queue" involved in the procedure that when you first encounter the receive expression you may ignore its presence.
Optionally, there may be an after-section in the expression that complicates the procedure a little.
The receive expression is best explained with a flowchart:
receive
pattern1 -> expressions1;
pattern2 -> expressions2;
pattern3 -> expressions3
after
Time -> expressionsTimeout
end
Why receive expression is sometimes called selective receive?
-module(my).
%-export([test/0, myand/2]).
-compile(export_all).
-include_lib("eunit/include/eunit.hrl").
start() ->
spawn(my, go, []).
go() ->
receive
{xyz, X} ->
io:format("I received X=~w~n", [X])
end.
In the erlang shell:
1> c(my).
my.erl:3: Warning: export_all flag enabled - all functions will be exported
{ok,my}
2> Pid = my:start().
<0.79.0>
3> Pid ! {hello, world}.
{hello,world}
4> Pid ! {xyz, 10}.
I received X=10
{xyz,10}
Note how there was no output for the first message that was sent, but there was output for the second message that was sent. The receive was selective: it did not receive all messages, it received only messages matching the specified pattern.
What is the "save queue"?
-module(my).
%-export([test/0, myand/2]).
-compile(export_all).
-include_lib("eunit/include/eunit.hrl").
start() ->
spawn(my, go, []).
go() ->
receive
{xyz, X} ->
io:format("I received X=~w~n", [X])
end,
io:format("What happened to the message that didn't match?"),
receive
Any ->
io:format("It was saved rather than discarded.~n"),
io:format("Here it is: ~w~n", [Any])
end.
In the erlang shell:
1> c(my).
my.erl:3: Warning: export_all flag enabled - all functions will be exported
{ok,my}
2> Pid = my:start().
<0.79.0>
3> Pid ! {hello, world}.
{hello,world}
4> Pid ! {xyz, 10}.
I received X=10
What happened to the message that didn't match?{xyz,10}
It was saved rather than discarded.
Here it is: {hello,world}
How the after section works?
-module(my).
%-export([test/0, myand/2]).
-compile(export_all).
-include_lib("eunit/include/eunit.hrl").
start() ->
spawn(my, go, []).
go() ->
receive
{xyz, X} ->
io:format("I received X=~w~n", [X])
after 10000 ->
io:format("I'm not going to wait all day for a match. Bye.")
end.
In the erlang shell:
1> c(my).
my.erl:3: Warning: export_all flag enabled - all functions will be exported
{ok,my}
2> Pid = my:start().
<0.79.0>
3> Pid ! {hello, world}.
{hello,world}
I'm not going to wait all day. Bye.4>
Another example:
-module(my).
%-export([test/0, myand/2]).
-compile(export_all).
-include_lib("eunit/include/eunit.hrl").
sleep(X) ->
receive
after X * 1000 ->
io:format("I just slept for ~w seconds.~n", [X])
end.
In the erlang shell:
1> c(my).
my.erl:3: Warning: export_all flag enabled - all functions will be exported
{ok,my}
2> my:sleep(5).
I just slept for 5 seconds.
ok
Here is an example trace where I'm able to call erlang:monitor/2 on the same Pid:
1> Loop = fun F() -> F() end.
#Fun<erl_eval.30.99386804>
2> Pid = spawn(Loop).
<0.71.0>
3> erlang:monitor(process, Pid).
#Ref<0.2485499597.1470627842.126937>
4> erlang:monitor(process, Pid).
#Ref<0.2485499597.1470627842.126942>
5> erlang:monitor(process, Pid).
#Ref<0.2485499597.1470627842.126947>
The expressions returned by instruction #4 and #5 are different than #3, meaning that it is possible to create multiple monitor references between the current process and Pid. Is there a practical case where you would need or use multiple monitor references to the same process?
I would expect this to return the same reference (returning a new one would perhaps imply that the old one had failed/crashed), following the same logic that exists for link/1.
Imagine you use third party library which does this (basically what OTP *:call/* functions does):
call(Pid, Request) ->
call(Pid, Request, ?DEFAULT_TIMEOUT).
call(Pid, Request, Timeout) ->
MRef = erlang:monitor(process, Pid),
Pid ! {call, self(), MRef, Request},
receive
{answer, MRef, Result} ->
erlang:demonitor(Mref, [flush]),
{ok, Result};
{'DOWN', MRef, _, _, Info} ->
{error, Info}
after Timeout ->
erlang:demonitor(MRef, [flush]),
{error, timeout}
end.
and then you use it in your code where you would monitor the same process Pid and then call function call/2,3.
my_fun1(Service) ->
MRef = erlang:monitor(process, Service),
ok = check_if_service_runs(MRef),
my_fun2(Service),
mind_my_stuf(),
ok = check_if_service_runs(MRef),
erlang:demonitor(MRef, [flush]),
return_some_result().
check_if_service_runs(MRef) ->
receive
{'DOWN', MRef, _, _, Info} -> {down, Info}
after 0 -> ok
end.
my_fun2(S) -> my_fun3(S).
% and a many layers of other stuff and modules
my_fun3(S) -> call(S, hello).
What a nasty surprise it would be if erlang:monitor/2,3 would always return the same reference and if erlang:demonitor/1,2 would remove your previous monitor. It would be a source of ugly and unsolvable bugs. You should start to think that there are libraries, other processes, your code is part of a huge system and Erlang was made by experienced people who thought it through. Maintainability is key here.
I'm applying the following command to two erlang prompts, they all generate the same sequence of random number, so does it mean it is pseudo random in Erlang language? I'm curious about the rationale, since in Java, the sequence will not be the same even if I provide it with the same seed for two times. Many thanks!
random:seed(6, 6, 6).
random:uniform(100).
random:uniform(100).
...
the generated sequence: 12, 27, 79, 58, 90, 25, ...
What you're describing is generally how traditional pseudorandom number generators (PRNGs) have always worked, including Erlang's random module, which I think implements Wichman-Hill, but today's PRNGs are necessarily more sophisticated. In Erlang 18 you'll find a new rand module that does not suffer the problem you're describing.
As you can see from the shell session copied below, you can just call the rand:uniform/0,1 functions from different processes without seeding, and the initial numbers in the various processes will be different:
1> rand:uniform().
0.10584199892675317
2> Self = self().
<0.1573.0>
3> f(R), spawn(fun() -> Self ! rand:uniform() end), receive R -> R end.
0.9124422823012622
4> f(R), spawn(fun() -> Self ! rand:uniform() end), receive R -> R end.
0.9476479571869831
5> f(R), spawn(fun() -> Self ! rand:uniform() end), receive R -> R end.
0.037189460750910064
6> f(R), spawn(fun() -> Self ! rand:uniform() end), receive R -> R end.
0.17698653918897836
The first call runs directly in the shell process. We then get the shell's pid, store it into Self, and spawn four processes in succession that each send the results of rand:uniform/0 back to the shell, which receives it into R. As you can see, the four spawned processes each return different values, all of which differ from the value the shell got when it first ran rand:uniform/0.
If you want a number in a range other than 0-1, pass an integer N to rand:uniform/1 and you'll get a value V in the range 1 <= V <= N:
7> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
510226
8> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
562646
9> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
250637
10> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
820871
11> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
121252
When studying carefully "gproc" project's gproc_tests.erl file. I have found the following code.
The "goodbye" message is send before "erlang:monitor/2", I think it is possible that 'DOWN' message won't be received. Is it correct? If so, the two lines should be switched, right?
t_simple_aggr_counter() ->
?assert(gproc:reg({c,l,c1}, 3) =:= true),
?assert(gproc:reg({a,l,c1}) =:= true),
?assert(gproc:get_value({a,l,c1}) =:= 3),
P = self(),
P1 = spawn_link(fun() ->
gproc:reg({c,l,c1}, 5),
P ! {self(), ok},
receive
{P, goodbye} -> ok
end
end),
receive {P1, ok} -> ok end,
?assert(gproc:get_value({a,l,c1}) =:= 8),
?assert(gproc:update_counter({c,l,c1}, 4) =:= 7),
?assert(gproc:get_value({a,l,c1}) =:= 12),
P1 ! {self(), goodbye}, %<<===========This line
R = erlang:monitor(process, P1), %<<======This line
receive {'DOWN', R, _, _, _} ->
gproc:audit_process(P1)
end,
?assert(gproc:get_value({a,l,c1}) =:= 7).
the erlang:monitor/2 call will still generate a {'DOWN', ...} message to the calling process even if the monitored process has already died.
for example:
1> F = fun() -> io:format("finished.~n") end.
#Fun<erl_eval.20.111823515>
2> Pid = spawn(F).
finished.
<0.45.0>
3> erlang:monitor(process, Pid). % process Pid has already exited.
#Ref<0.0.0.76>
4> flush().
Shell got {'DOWN',#Ref<0.0.0.76>,process,<0.45.0>,noproc}
ok
According to documentation erlang:monitor/2: A 'DOWN' message will be sent to the monitoring process if Item dies, if Item does not exist, or if the connection is lost to the node which Item resides on.