Tail recursion vs non tail recursion. Is the former slower? - erlang

I am learning the basics of functional programming and Erlang, and I've implemented three versions of the factorial function: using recursion with guards, using recursion with pattern matching, and using tail recursion.
I am trying to compare the performance of each factorial implementation (Erlang/OTP 22 [erts-10.4.1]):
%% Simple factorial code:
fac(N) when N == 0 -> 1;
fac(N) when N > 0 -> N * fac(N - 1).
%% Using pattern matching:
fac_pattern_matching(0) -> 1;
fac_pattern_matching(N) when N > 0 -> N * fac_pattern_matching(N - 1).
%% Using tail recursion (and pattern matching):
tail_fac(N) -> tail_fac(N, 1).
tail_fac(0, Acc) -> Acc;
tail_fac(N, Acc) when N > 0 -> tail_fac(N - 1, N * Acc).
Timer helper:
-define(PRECISION, microsecond).
execution_time(M, F, A, D) ->
StartTime = erlang:system_time(?PRECISION),
Result = apply(M, F, A),
EndTime = erlang:system_time(?PRECISION),
io:format("Execution took ~p ~ps~n", [EndTime - StartTime, ?PRECISION]),
if
D =:= true -> io:format("Result is ~p~n", [Result]);
true -> ok
end
.
Execution results:
Recursive version:
3> mytimer:execution_time(factorial, fac, [1000000], false).
Execution took 1253949667 microseconds
ok
Recursive with pattern matching version:
4> mytimer:execution_time(factorial, fac_pattern_matching, [1000000], false).
Execution took 1288239853 microseconds
ok
Tail recursive version:
5> mytimer:execution_time(factorial, tail_fac, [1000000], false).
Execution took 1405612434 microseconds
ok
I was expecting tail recursion version to perform better than the other two but, to my surprise it is less performant. These results are the exact opposite of what I was expecting.
Why?

The problem is in function which you choose. Factorial is a function which grows very fast. Erlang has implemented big integer arithmetics, so it will not overflow. You are effectively measuring how good is underlying big integer implementation. 1000000! is a huge number. It is 8.26×10^5565708 which is like 5.6MB long written as a decadic number. There is a difference between your fac/1 and tail_fac/1 how fast they reach big numbers where big integer implementation kicks in and how fast the number grows. In you fac/1 implementation you are effectively computing 1*2*3*4*...*N. In your tail_fac/1 implementation you are computing N*(N-1)*(N-2)*(N-3)*...*1. Do you see the issue there? You can write tail call implementation in a different way:
tail_fac2(N) when is_integer(N), N > 0 ->
tail_fac2(N, 0, 1).
tail_fac2(X, X, Acc) -> Acc;
tail_fac2(N, X, Acc) ->
Y = X + 1,
tail_fac2(N, Y, Y*Acc).
It will work much better. I'm not patient as you are so I will measure a little bit smaller numbers but the new fact:tail_fac2/1 shoudl outperform fact:fac/1 every single time:
1> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7743768
2> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7629604
3> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7651739
4> element(1, timer:tc(fun()-> fact:tail_fac(100000) end)).
7229662
5> element(1, timer:tc(fun()-> fact:tail_fac(100000) end)).
7104056
6> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6491195
7> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6506565
8> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6519624
As you can see fact:tail_fac2/1 for N = 100000 takes 6.5s, fact:tail_fac/1 takes 7.2s and fact:fac/1 takes 7.6s. Even faster growth doesn't overturn tail call benefit so tail call version is faster than body recursive one there is clearly seen that slower growth of accumulator in fact:tail_fac2/1 show its impact.
If you choose a different function for tail call optimization testing you can see the impact of tail call optimization more clearly. For example sum:
sum(0) -> 0;
sum(N) when N > 0 -> N + sum(N-1).
tail_sum(N) when is_integer(N), N >= 0 ->
tail_sum(N, 0).
tail_sum(0, Acc) -> Acc;
tail_sum(N, Acc) -> tail_sum(N-1, N+Acc).
And speed is:
1> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
970749
2> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
126288
3> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
113115
4> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
104371
5> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
125857
6> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
92282
7> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
92634
8> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
68047
9> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
87748
10> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
94233
As you can see, there we can easily use N=10000000 and it works pretty fast. Anyway, body recursive function is significantly slower 110ms vs 85ms. You can notice the first run of fact:sum/1 took 9x longer than the rest of runs. It is because of body recursive function consuming a stack. You will not see such effect when you use a tail recursive counterpart. (Try it.) You can see the difference if you run each measurement in a separate process.
1> F = fun(G, N) -> spawn(fun() -> {T, _} = timer:tc(fun()-> fact:G(N) end), io:format("~p took ~bus and ~p heap~n", [G, T, element(2, erlang:process_info(self(), heap_size))]) end) end.
#Fun<erl_eval.13.91303403>
2> F(tail_sum, 10000000).
<0.88.0>
tail_sum took 70065us and 987 heap
3> F(tail_sum, 10000000).
<0.90.0>
tail_sum took 65346us and 987 heap
4> F(tail_sum, 10000000).
<0.92.0>
tail_sum took 65628us and 987 heap
5> F(tail_sum, 10000000).
<0.94.0>
tail_sum took 69384us and 987 heap
6> F(tail_sum, 10000000).
<0.96.0>
tail_sum took 68606us and 987 heap
7> F(sum, 10000000).
<0.98.0>
sum took 954783us and 22177879 heap
8> F(sum, 10000000).
<0.100.0>
sum took 931335us and 22177879 heap
9> F(sum, 10000000).
<0.102.0>
sum took 934536us and 22177879 heap
10> F(sum, 10000000).
<0.104.0>
sum took 945380us and 22177879 heap
11> F(sum, 10000000).
<0.106.0>
sum took 921855us and 22177879 heap

Related

Spawning 1000 processes at the same time in Erlang

I want to spawn 1000 or a variable number of processes in Erlang.
server.erl:
-module(server).
-export([start/2]).
start(LeadingZeroes, InputString) ->
% io:format("Leading Zeroes: ~w", [LeadingZeroes]),
% io:format("InputString: ~p", [InputString]).
mineCoins(LeadingZeroes, InputString, 100).
mineCoins(LeadingZeroes, InputString, Target) ->
PID = spawn(miner, findTargetHash(), []), % How to spawn this process 1000 times so that each process computes something and sends the results here
PID ! {self(), {mine, LeadingZeroes, InputString, Target}},
receive
{found, Number} ->
io:fwrite("Rectangle area: ~w", [Number]);
% {square, Area} ->
% io:fwrite("Square area: ~w", [Area]);
Other ->
io:fwrite("In Other!")
end.
% io:fwrite("Yolo: ~w", [Square_Area]).
miner.erl (client):
-module(miner).
-export([findTargetHash/0]).
findTargetHash() ->
receive
{From , {mine, LeadingZeroes, InputString, Target}} ->
% do something here
From ! {found, Number};
{From, {else, X}} ->
io:fwrite("In Else area"),
From ! {square, X*X}
end,
findTargetHash().
Here, I wish to spawn the processes, 1000 of them(miner), how does one achieve this? Through list comprehensions or recursion or any other way?
Generally, you can do something N times like this:
-module(a).
-compile(export_all).
go(0) ->
io:format("!finished!~n");
go(N) ->
io:format("Doing something: ~w~n", [N]),
go(N-1).
In the shell:
3> c(a).
a.erl:2:2: Warning: export_all flag enabled - all functions will be exported
% 2| -compile(export_all).
% | ^
{ok,a}
4> a:go(3).
Doing something: 3
Doing something: 2
Doing something: 1
!finished!
ok
If you need to start N processes and subsequently send messages to them, then you will need their pids to do that, so you will have to save their pids somewhere:
go(0, Pids) ->
io:format("All workers have been started.~n"),
Pids;
go(N, Pids) ->
Pid = spawn(b, worker, [self()]),
go(N-1, [Pid|Pids]).
-module(b).
-compile(export_all).
worker(From) ->
receive
{From, Data} ->
io:format("Worker ~w received ~w.~n", [self(), Data]),
From ! {self(), Data * 3};
Other ->
io:format("Error, received ~w.~n", [Other])
end.
To start N=3 worker processes, you would call go/2 like this:
Pids = a:go(3, []).
That's a little bit awkward for someone who didn't write the code: why do I have to pass an empty list? So, you could define a go/1 like this:
go(N) -> go(N, []).
Then, you can start 3 worker processes by simply writing:
Pids = go(3).
Next, you need to send each of the worker processes a message containing the work they need to do:
do_work([Pid|Pids], [Data|Datum]) ->
Pid ! {self(), Data},
do_work(Pids, Datum);
do_work([], []) ->
io:format("All workers have been sent their work.~n").
Finally, you need to gather the results from the workers:
gather_results([Worker|Workers], Results) ->
receive
{Worker, Result} ->
gather_results(Workers, [Result|Results])
end;
gather_results([], Results) ->
Results.
A couple of things to note about gather_results/2:
The Worker variable in the receive has already been assigned a value in the head of the function, so the receive is not waiting for just any worker process to send a message, rather the receive is waiting for a particular worker process to send a message.
The first Worker process in the list of Workers may be the longest running process, and you may wait in the receive for, say, 10 minutes for that process to finish, but then getting the results from the other worker processes will require no waiting. Therefore, gathering all the results will essentially take as long as the longest process plus a few microseconds to loop through the other processes. Similarly, for other orderings of the longest and shortest processes in the list, it will only take a time equal to the longest process plus a few microseconds to receive all the results.
Here is a test run in the shell:
27> c(a).
a.erl:2:2: Warning: export_all flag enabled - all functions will be exported
% 2| -compile(export_all).
% | ^
{ok,a}
28> c(b).
b.erl:2:2: Warning: export_all flag enabled - all functions will be exported
% 2| -compile(export_all).
% | ^
{ok,b}
29> Pids = a:go(3, []).
All workers have been started.
[<0.176.0>,<0.175.0>,<0.174.0>]
30> a:do_work(Pids, [1, 2, 3]).
All workers have been sent their work.
Worker <0.176.0> received 1.
Worker <0.175.0> received 2.
Worker <0.174.0> received 3.
ok
31> a:gather_results(Pids, []).
[9,6,3]

How to efficiently read thousand of lines from STDIN in Erlang?

I've stumbled upon an issue when reading thousands of lines from STDIN. This would have been an imaginary edge case until I found out that some tests for this problem require reading thousand of lines from STDIN. At first I thought that my algorithms were not optimal, and only by accident I've found out that only reading lines without any computations could make half of the test time out.
Here is part code that times out:
process_queries(0, _) -> ok;
process_queries(N, A) ->
case io:fread("", "~s~d~d") of
{ok, _} -> process_queries(N - 1, A)
%{ok, ["Q", L, R]} -> process_queries(N - 1, apply_q(L, R, A));
%{ok, ["U", Idx, Val]} -> process_queries(N - 1, apply_u(Idx, Val, A))
end
.
I deliberately left comments to show that all the computations were disabled. So this code timed out given N=7984.
Is there a better way of reading and processing thousands lines from STDIN in Erlang?
io:get_line gets only one line a time.
io:get_chars requires you to know how many character to get.
I'd suggest switching stdio to binary and then using io:get_line. Your data's format is pretty simple to parse by splitting on whitespace and converting two values to integers. The following code runs ~10 times faster than your code for me in a simple benchmark. I used escript to benchmark, which means it's highly likely that the difference is actually more than 10 times since escript parses and compiles the code on the fly.
process_queries_2(0, _) -> ok;
process_queries_2(N, A) ->
Line = io:get_line(""),
[X, Y0, Z0, _] = binary:split(Line, [<<$\s>>, <<$\n>>], [global]),
Y = binary_to_integer(Y0),
Z = binary_to_integer(Z0),
% use X, Y, Z
process_queries_2(N - 1, A).
Here's the code I used to benchmark:
main(["1"]) ->
ok = io:setopts(standard_io, [binary]),
process_queries(10000, {});
main(["2"]) ->
ok = io:setopts(standard_io, [binary]),
process_queries_2(10000, {}).%
$ time yes 'Q 100 200' | escript a.erl 1
yes 'Q 100 200' 4.64s user 0.11s system 93% cpu 5.063 total
escript a.erl 1 4.67s user 1.44s system 120% cpu 5.062 total
$ time yes 'Q 100 200' | escript a.erl 2
yes 'Q 100 200' 0.36s user 0.01s system 77% cpu 0.466 total
escript a.erl 2 0.40s user 0.10s system 106% cpu 0.464 total
The reason for the speedup is that Erlang Strings are linked lists, which are very inefficient both for CPU time and Memory usage compared to binaries, which is a sequential chunk of memory.
There is an excerpt from my solution. There are few tricks how to do it really efficient.
read_command(CP) ->
{ok, Line} = file:read_line(standard_io),
[C, A, B] = binary:split(Line, CP, [global, trim_all]),
{case C of <<"Q">> -> 'Q'; <<"U">> -> 'U' end,
binary_to_integer(A),
binary_to_integer(B)}.
read_commands(N, CP) ->
[ read_command(CP) || _ <- lists:seq(1, N) ].
execute(Array, L) ->
lists:foldl(fun({'Q', F, T}, A) ->
{Val, A2} = query(A, F, T),
file:write(standard_io, [integer_to_binary(Val), $\n]),
A2;
({'U', I, V}, A) ->
update(A, I, V)
end, Array, L).
read_int_line(CP) ->
{ok, Line} = file:read_line(standard_io),
[binary_to_integer(X) || X <- binary:split(Line, CP, [global, trim_all])].
main() ->
ok = io:setopts([binary]),
CP = binary:compile_pattern([<<" ">>, <<$\n>>]),
[N] = read_int_line(CP),
L = read_int_line(CP),
N = length(L),
[K] = read_int_line(CP),
execute(init(L), read_commands(K, CP)).
You have to write your own init/1, update/3 and query/3 of course.

spawn/1 and sharing of the outer variables

For the following fragment:
outer_func(State) ->
spawn(fun()-> do_something(State) end).
Will State be shared or deep-copied to the spawned process heap?
It will be deep copied. Here's a simple demo:
1> State = lists:seq(1, 1000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
2> DoSomething = fun(State) -> io:format("~p~n", [process_info(self(), memory)]) end.
3> spawn(fun() -> DoSomething(State) end), spawn(fun() -> DoSomething(State) end), spawn(fun() -> DoSomething(State) end).
{memory,16583520}
{memory,16583520}
{memory,16583520}
In contrast to that, here's the output when the state is a large binary which is never "deep" copied when shared with multiple processes:
1> State = binary:copy(<<"a">>, 50000000).
<<"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"...>>
2> DoSomething = fun(State) -> io:format("~p~n", [process_info(self(), memory)]) end.
3> spawn(fun() -> DoSomething(State) end), spawn(fun() -> DoSomething(State) end), spawn(fun() -> DoSomething(State) end).
{memory,8744}
{memory,8744}
{memory,8744}
So a process with a list of integers from 1 to 1 million used about 16MB of memory while the one with a large binary used 8KB (the binary should actually be a negligible part of that).

Why does Erlang generate the same sequence of random number if applying the same seed?

I'm applying the following command to two erlang prompts, they all generate the same sequence of random number, so does it mean it is pseudo random in Erlang language? I'm curious about the rationale, since in Java, the sequence will not be the same even if I provide it with the same seed for two times. Many thanks!
random:seed(6, 6, 6).
random:uniform(100).
random:uniform(100).
...
the generated sequence: 12, 27, 79, 58, 90, 25, ...
What you're describing is generally how traditional pseudorandom number generators (PRNGs) have always worked, including Erlang's random module, which I think implements Wichman-Hill, but today's PRNGs are necessarily more sophisticated. In Erlang 18 you'll find a new rand module that does not suffer the problem you're describing.
As you can see from the shell session copied below, you can just call the rand:uniform/0,1 functions from different processes without seeding, and the initial numbers in the various processes will be different:
1> rand:uniform().
0.10584199892675317
2> Self = self().
<0.1573.0>
3> f(R), spawn(fun() -> Self ! rand:uniform() end), receive R -> R end.
0.9124422823012622
4> f(R), spawn(fun() -> Self ! rand:uniform() end), receive R -> R end.
0.9476479571869831
5> f(R), spawn(fun() -> Self ! rand:uniform() end), receive R -> R end.
0.037189460750910064
6> f(R), spawn(fun() -> Self ! rand:uniform() end), receive R -> R end.
0.17698653918897836
The first call runs directly in the shell process. We then get the shell's pid, store it into Self, and spawn four processes in succession that each send the results of rand:uniform/0 back to the shell, which receives it into R. As you can see, the four spawned processes each return different values, all of which differ from the value the shell got when it first ran rand:uniform/0.
If you want a number in a range other than 0-1, pass an integer N to rand:uniform/1 and you'll get a value V in the range 1 <= V <= N:
7> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
510226
8> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
562646
9> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
250637
10> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
820871
11> f(R), spawn(fun() -> Self ! rand:uniform(1234567) end), receive R -> R end.
121252

is this higher-order function correct?

As part of learning Erlang, one of the problems I am trying to solve is
Write a higher-order function filter(F, L), which returns all the
elements X in L for which F(X) is true.
I am new to functional programming and very confused with higher-order functions as well.
My attempt looks like
filter(F, L) -> [T || T <- L, F(T) =:= true].
2> IsEven = fun(X) -> X rem 2 =:= 0 end.
#Fun<erl_eval.6.90072148>
3> IsEven(2).
true
4> IsEven(3).
false
5> math_functions:filter(IsEven, lists:seq(1, 10)).
[2,4,6,8,10]
6> math_functions:filter(IsEven, lists:seq(1, 20)).
[2,4,6,8,10,12,14,16,18,20]
Question
Is this is really higher-order function? Please guide

Resources