Optimizing max Collatz sequence - erlang

I'm looking to optimize my solution for the maximum Collatz sequence problem in Erlang. Right now I've tried using ETS, and the following solution uses maps, but I'm getting worse performance than I feel I should. Is there perhaps some optimization I could do to improve it?
-module(collatzMaps).
-export([start/2, s/4]).
collatz(0, Map) ->
{0, Map};
collatz(M, Map) ->
Exists = maps:is_key(M, Map),
case Exists of
false ->
case M rem 2 == 0 of
true ->
Result = collatz(M div 2, Map),
Val = (1 + element(1, Result)),
Map1 = maps:put(M, Val, element(2, Result)),
{maps:get(M, Map1), Map1};
false ->
Result = collatz((3 * M + 1), Map),
Val = (1 + element(1, Result)),
Map2 = maps:put(M, Val, element(2, Result)),
{maps:get(M, Map2), Map2}
end;
true ->
{maps:get(M, Map), Map}
end.
s(N, M, Max, Map) ->
if
N =< M ->
Result = collatz(N, Map),
if
element(1, Result) > Max ->
NextMax = element(1, Result),
MapNext = element(2, Result),
s(N + 1, M, NextMax, MapNext);
true ->
MapNext = element(2, Result),
s(N + 1, M, Max, MapNext)
end;
true ->
Max
end.
start(N, M)->
statistics(runtime),
statistics(wall_clock),
Map = maps:new(),
Map1 = maps:put(1, 1, Map),
G = s(N, M, 0, Map1),
{_, Time2} = statistics(wall_clock),
U2 = Time2 / 1000,
io:format("~p seconds~n", [U2]),
G.

Well, first let's tweak up invocation which will allow us to make some simple statistics and compare different approaches
-export([start/2, max_collatz/2]).
...
max_collatz(N, M) ->
Map = maps:new(),
Map1 = maps:put(1, 1, Map),
s(N, M, 0, Map1).
start(N, M)->
{T, Result} = timer:tc( fun() -> max_collatz(N, M) end),
io:format("~p seconds~n", [T / 1000000]),
Result.
So let's write it more Erlang idiomatic way
-module(collatz).
-export([start/2, max_collatz/2]).
collatz_next(N) when N rem 2 =:= 0 ->
N div 2;
collatz_next(N) ->
3 * N + 1.
collatz_length(N, Map) ->
case Map of
#{N := L} -> {L, Map};
_ ->
{L, Map2} = collatz_length(collatz_next(N), Map),
{L + 1, Map2#{N => L + 1}}
end.
max_collatz(N, M) ->
Map = lists:foldl(fun(X, Map) -> {_, Map2} = collatz_length(X, Map), Map2 end,
#{1 => 1}, lists:seq(N, M)),
lists:max(maps:values(Map)).
start(N, M) ->
{T, Result} = timer:tc(fun() -> max_collatz(N, M) end),
io:format("~p seconds~n", [T / 1000000]),
Result.
Then we can compare speed using for example eministat.
Clone in
git clone https://github.com/jlouis/eministat.git
cd eministat
make
If you run in a problem like
DEPEND eministat.d
ERLC eministat.erl eministat_analysis.erl eministat_ds.erl eministat_plot.erl eministat_report.erl eministat_resample.erl eministat_ts.erl
compile: warnings being treated as errors
src/eministat_resample.erl:8: export_all flag enabled - all functions will be exported
erlang.mk:4940: recipe for target 'ebin/eministat.app' failed
make[1]: *** [ebin/eministat.app] Error 1
erlang.mk:4758: recipe for target 'app' failed
make: *** [app] Error 2
You can fix it
diff --git src/eministat_resample.erl src/eministat_resample.erl
index 1adf401..0887b2c 100644
--- src/eministat_resample.erl
+++ src/eministat_resample.erl
## -5,7 +5,7 ##
-include("eministat.hrl").
-export([resample/3, bootstrap_bca/3]).
--compile(export_all).
+-compile([nowarn_export_all, export_all]).
%% #doc resample/3 is the main resampler of eministat
%% #end
So then run it
$ erl -pa eministat/ebin/
Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Eshell V10.1 (abort with ^G)
1> c(collatzMaps), c(collatz).
{ok,collatz}
2> eministat:x(95.0, eministat:s(orig, fun() -> collatzMaps:max_collatz(1, 100000) end, 30), eministat:s(new, fun() -> collatz:max_collatz(1, 100000) end, 30)).
x orig
+ new
+--------------------------------------------------------------------------+
|+ ++++++++ +++++ * + +x+**+xxxx**x xxx xx+x xxx *x x + x x|
| + + + x x xx x |
| + |
| |_______M___A__________| |
| |________M_____A______________| |
+--------------------------------------------------------------------------+
Dataset: x N=30 CI=95.0000
Statistic Value [ Bias] (Bootstrapped LB‥UB)
Min: 1.76982e+5
1st Qu. 1.81610e+5
Median: 1.82954e+5
3rd Qu. 1.87030e+5
Max: 1.94944e+5
Average: 1.84280e+5 [ 8.00350] ( 1.82971e+5 ‥ 1.85749e+5)
Std. Dev: 3999.87 [ -102.524] ( 3128.74 ‥ 5431.13)
Outliers: 0/0 = 0 (μ=1.84288e+5, σ=3897.35)
Outlier variance: 3.22222e-2 (slight)
------
Dataset: + N=30 CI=95.0000
Statistic Value [ Bias] (Bootstrapped LB‥UB)
Min: 1.69179e+5
1st Qu. 1.72501e+5
Median: 1.74614e+5
3rd Qu. 1.79850e+5
Max: 1.90638e+5
Average: 1.76517e+5 [ 3.11862] ( 1.74847e+5 ‥ 1.78679e+5)
Std. Dev: 5343.46 [ -147.802] ( 4072.99 ‥ 7072.53)
Outliers: 0/0 = 0 (μ=1.76520e+5, σ=5195.66)
Outlier variance: 9.43164e-2 (slight)
Difference at 95.0% confidence
-7762.60 ± 2439.69
-4.21240% ± 1.32391%
(Student's t, pooled s = 4719.72)
------
ok
So it seems like 4% faster now which is not much. First, we can inline collatz_next/1 which is basically what you have in your collatz/2 function. I like to be specific so I put between -export and a first function
-compile({inline, [collatz_next/1]}).
It have very little effect
Difference at 95.0% confidence
-9895.27 ± 5524.91
-5.24520% ± 2.92860%
(Student's t, pooled s = 1.06882e+4)
Then we can try roll out lists:fold/2, lists:seq/2 and lists:max/1 as in your s/4 function but let's do it more idiomatic way.
max_collatz(N, M) ->
max_collatz(N, M, 1, #{1 => 1}).
max_collatz(M, M, Max, _) -> Max;
max_collatz(N, M, Max, Map) ->
case collatz_length(N + 1, Map) of
{L, Map2} when L > Max ->
max_collatz(N + 1, M, L, Map2);
{_, Map2} ->
max_collatz(N + 1, M, Max, Map2)
end.
Well it's better but still not much
Difference at 95.0% confidence
-1.78775e+4 ± 1980.35
-9.66832% ± 1.07099%
Now, when we removed all external code calls it's worth to try native compiling (external function call usually ruins any native compilation benefit). We could also add little type hint for HiPE but it seems to have barely any effect (it is usually worth to try for floating point arithmetic which is not this case and heavy usage of maps is probably issuing problem here as well).
max_collatz(N, M) when N < M, is_integer(N), is_integer(M) ->
max_collatz(N, M, 1, #{1 => 1}).
Not much better
c(collatz, [native]).
...
Difference at 95.0% confidence
-2.26703e+4 ± 2651.32
-12.1721% ± 1.42354%
(Student's t, pooled s = 5129.13)
So its time try it dirty. Process dictionary is not the recommended place to store your data but if it is inside special process it is an acceptable solution.
collatz_length(N) ->
case get(N) of
undefined ->
L = collatz_length(collatz_next(N)),
put(N, L + 1),
L + 1;
L -> L
end.
max_collatz(N, M) when N < M, is_integer(N), is_integer(M) ->
P = self(),
W = spawn_link(fun() ->
put(1, 1),
P ! {self(), max_collatz(N, M, 1)}
end),
receive {W, Max} -> Max end.
max_collatz(M, M, Max) -> Max;
max_collatz(N, M, Max) ->
case collatz_length(N + 1) of
L when L > Max ->
max_collatz(N + 1, M, L);
_ ->
max_collatz(N + 1, M, Max)
end.
Yes, its dirty but working solution and its worth it (even without native)
Difference at 95.0% confidence
-1.98173e+5 ± 5450.92
-80.9384% ± 2.22628%
(Student's t, pooled s = 1.05451e+4)
So here we are from 3.6s down to 0.93s using some dirty tricks but anyway, if you would do this sort of tasks, you would probably use NIF written in C. It is not a type of task where Erlang shine.
> collatzMaps:start(1, 1000000).
3.576669 seconds
525
> collatz:start(1, 1000000).
0.931186 seconds
525

Related

How to divide a string into substrings?

I would like to divide a string to sub-strings based on a given number , for example:
divide("string",1) = ["s","t","r","i","n","g"].
I have tried this, but no success .
lists:split(1,"string") = {"s", "tring"}
Any idea?
I would calculate the length once (since it's a slow operation) and then recursively use lists:split/2 until the list left is smaller than N:
divide(List, N) ->
divide(List, N, length(List)).
divide(List, N, Length) when Length > N ->
{A, B} = lists:split(N, List),
[A | divide(B, N, Length - N)];
divide(List, _, _) ->
[List].
1> c(a).
{ok,a}
2> a:divide("string", 1).
["s","t","r","i","n","g"]
3> a:divide("string", 2).
["st","ri","ng"]
4> a:divide("string", 3).
["str","ing"]
5> a:divide("string", 4).
["stri","ng"]
6> a:divide("string", 5).
["strin","g"]
7> a:divide("string", 6).
["string"]
8> a:divide("string", 7).
["string"]
I think #Dogbert solution is currently the best... But here an other implementation example with recursive loop.
divide_test() ->
[?assertEqual(divide("string",1), ["s","t","r","i","n","g"]),
?assertEqual(divide("string",2), ["st","ri","ng"]),
?assertEqual(divide("string",3), ["str","ing"]),
?assertEqual(divide("string",4), ["stri","ng"])
].
-spec divide(list(), integer()) -> list(list()).
divide(String, Size)
when is_list(String), is_integer(Size) ->
divide(String, Size, 0, [], []).
-spec divide(list(), integer(), integer(), list(), list()) -> list(list()).
divide([], _, _, Buf, Result) ->
Return = [lists:reverse(Buf)] ++ Result,
lists:reverse(Return);
divide([H|T], Size, 0, Buf, Result) ->
divide(T, Size, 1, [H] ++ Buf, Result);
divide([H|T], Size, Counter, Buf, Result) ->
case Counter rem Size =:= 0 of
true ->
divide(T, Size, Counter+1, [H] ++ [], [lists:reverse(Buf)] ++ Result);
false ->
divide(T, Size, Counter+1, [H] ++ Buf, Result)
end.
You can try this function. provided the number is > 0 less than or equal to string length divided by two.
first_substring(List, Separator) ->
first_substring_loop(List, Separator, []).
first_substring_loop([], _, Reversed_First) ->
lists:reverse(Reversed_First);
first_substring_loop(List, Separator, Reversed_First) ->
[H|T]= my_tuple_to_list(lists:split(Separator,List)),
first_substring_loop(lists:flatten(T), Separator, [H|Reversed_First]).
my_tuple_to_list(Tuple) -> [element(T, Tuple) || T <- lists:seq(1, tuple_size(Tuple))].
the result is
1> fact:first_substring("string", 1).
["s","t","r","i","n","g"]
2> fact:first_substring("string", 2).
["st","ri","ng"]
3> fact:first_substring("string", 3).
["str","ing"]
A short simple solution can be:
divide(String, Length) -> divide(String, Length, []).
divide([], _, Acc) -> Acc;
divide(String, Length, Acc) ->
{Res, Rest} = lists:split(min(Length, length(String)), String),
divide(Rest, Length, Acc ++ [Res]).
Also for a specific case of splitting with length 1, a list comprehension can be used:
ListOfLetters = [[Letter] || Letter <- String].

erlang; outsmarting compiler with memoization?

The following is my solution to Project Euler 14, which works (in 18 s):
%Which starting number, under one million, produces the longest Collartz chain?
-module(soln14).
-export([solve/0]).
collatz(L) ->
[H|T] = L,
F = erlang:get({'collatz', H}),
case is_list(F) of
true ->
R = lists:append(F, T);
false ->
if H == 1 ->
R = L;
true ->
if H rem 2 == 0 ->
R = collatz([H div 2 | L]);
true ->
R = collatz([3*H+1 | L])
end
end,
erlang:put({'collatz', lists:last(L)}, R),
R
end.
dosolve(N, Max, MaxN, TheList) ->
if N == 1000000 -> MaxN;
true ->
L = collatz([N]),
M = length(L),
if M > Max -> dosolve(N+1, M, N, L);
true ->
dosolve(N+1, Max, MaxN, TheList)
end
end.
solve() ->
{Megass, Ss, Micros} = erlang:timestamp(),
S = dosolve(1, -1, 1, []),
{Megase, Se, Microe} = erlang:timestamp(),
{Megase-Megass, Se-Ss, Microe-Micros, S}.
However, the compiler complains:
8> c(soln14).
soln14.erl:20: Warning: variable 'R' is unused
{ok,soln14}
9> soln14:solve().
{0,18,-386776,837799}
Is this a compiler scoping error, or do I have a legit bug?
It's not a compiler error, just a warning that in the true case of "case is_list(F) of", the bindning of R to the result of lists:append() is pointless, since this value of R will not be used after that point, just returned immediately. I'll leave it to you to figure out if that's a bug or not. It may be that you are fooled by your indentation. The lines "erlang:put(...)," and "R" are both still within the "false" case of "case is_list(F) of", and should be deeper indented to reflect this.
The error message and the code are not "synchronized". with the version you give, the warning is on line 10: R = lists:append(F, T);.
What it means is that you bind the result of the lists:append/2 call to R and that you don't use it later in the true statement.
this is not the case in the false statement since you use R in the function erlang:put/2.
You could write the code this way:
%Which starting number, under one million, produces the longest Collartz chain?
-module(soln14).
-export([solve/0,dosolve/4]).
collatz(L) ->
[H|T] = L,
F = erlang:get({'collatz', H}),
case is_list(F) of
true ->
lists:append(F, T);
false ->
R = if H == 1 ->
L;
true ->
if H rem 2 == 0 ->
collatz([H div 2 | L]);
true ->
collatz([3*H+1 | L])
end
end,
erlang:put({'collatz', lists:last(L)}, R),
R
end.
dosolve(N, Max, MaxN, TheList) ->
if N == 1000000 -> MaxN;
true ->
L = collatz([N]),
M = length(L),
if M > Max -> dosolve(N+1, M, N, L);
true ->
dosolve(N+1, Max, MaxN, TheList)
end
end.
solve() ->
timer:tc(?MODULE,dosolve,[1, -1, 1, []]).
Warning the code uses a huge amount of memory, collatz is not tail recursive, and it seems that there is some garbage collecting witch is not done.

Iterate over a cartesian product in Erlang without generating a list first

What's the Erlang equivalent to the following Python code:
for x in range(9):
for y in range(9):
for z in range(9):
foo(x, y, z)
I know I can generate the product first with C = [{X,Y,Z} || X<- lists:seq(1,9), Y<- lists:seq(1,9), Z<- lists:seq(1,9)] then foo([])->done; foo([H|T])->blah blah.
How do I do it without an auxiliary list, using recursion only?
You could do it with three recursive functions.
You might be able to do it with some complex pattern-matching in function head.
But easiest way to skip creation of auxiliary list is to call your function inside list comprehension
C = [foo(X, Y, Z) || X<- lists:seq(1,9),
Y<- lists:seq(1,9),
Z<- lists:seq(1,9)]
Where foo/3 process one element.
List comprehension still forces you to create auxiliary lists in memory.
In case of dealing with huge data sets you should avoid it. Writing recursive functions every time is also awkward so i came up with my own generic for function. It's a little bit slower in traversing than direct recursion or list comprehension but it's memory stable, generic and easy to use.
Usage:
(for({10}))(
fun (X) -> io:format("~p ",[X]) end).
> 1 2 3 4 5 6 7 8 9 10
(for({10, -10, -2}))(
fun (X) -> io:format("~p ",[X]) end).
> 10 8 6 4 2 0 -2 -4 -6 -8 -10
Works with lists too:
(for(lists:seq(10, -10, -2)))(
fun (X) -> io:format("~p ",[X]) end).
> 10 8 6 4 2 0 -2 -4 -6 -8 -10
It's also possible to define step or guard as a function:
(for({256, 1.1, fun (X) -> math:sqrt(X) end, fun (X, Range) -> X > Range end}))(
fun (X) -> io:format("~p ",[X]) end).
> 256 16.0 4.0 2.0 1.4142135623730951 1.189207115002721
If you pass to for a two parameter function, then you can use accumulator feature just like with lists:foldl/3. You also need to pass initial accumulator to for:
Fact = (for(1, {1, 5}))(
fun(X, Acc) ->
X * Acc
end),
io:format("~p", [Fact]).
> 120
e_fact(N) ->
{_, E} = (for({1, 1}, {1, N}))( % i assumed 1/0! equals 1
fun(X, {LastFact, Sum}) ->
Fact = LastFact * X,
{Fact, Sum + 1 / Fact}
end),
E.
io:format("e=~p", [e_fact(10)]).
> e=2.7182818011463845
Also step and guard functions can be dependent on accumulator. Just pass function with one more parameter.
Nested loops finding Pythagorean triples. Easy with closures:
pyth_lists(N) ->
[io:format("~p ", [{A, B, C}]) ||
A <- lists:seq(1, N),
B <- lists:seq(A + 1, N),
C <- lists:seq(B + 1, N),
A * A + B * B == C * C].
pyth_for(N) ->
(for({1, N}))(
fun(A) ->
(for({A + 1, N}))(
fun(B) ->
(for({B + 1, N}))(
fun(C) ->
case A * A + B * B == C * C of
true -> io:format("~p ", [{A, B, C}]);
false -> ok
end
end)
end)
end).
It's too small for external repository. I keep it in my utilities module.
If you find it helpful, here is code:
-export([for/1, for/2]).
for(Through) ->
for([], Through).
for(InitAcc, Opts) when is_tuple(Opts) ->
{Init, Range, Step, Guard} = for_apply_default_opts(Opts),
fun(Fun) ->
UpdFun = if
is_function(Fun, 1) ->
fun(I, _FAcc) -> Fun(I) end;
is_function(Fun, 2) ->
Fun
end,
for_iter(UpdFun, InitAcc, Init, Range, Step, Guard) end;
for(InitAcc, List) when is_list(List) ->
fun(Fun) -> for_list_eval(Fun, InitAcc, List) end.
for_iter(Fun, Acc, I, Range, Step, Guard) ->
case Guard(I, Range, Acc) of
false ->
Acc;
true ->
NewAcc = Fun(I, Acc),
for_iter(Fun, NewAcc, Step(I, NewAcc), Range, Step, Guard)
end.
for_list_eval(Fun, Acc, List) ->
if
is_function(Fun, 1) ->
lists:foreach(Fun, List);
is_function(Fun, 2) ->
lists:foldl(Fun, Acc, List)
end.
for_apply_default_opts({Range}) ->
DefaultInit = 1,
for_apply_default_opts({DefaultInit, Range});
for_apply_default_opts({Init, Range}) ->
DefaultStep = 1,
for_apply_default_opts({Init, Range, DefaultStep});
for_apply_default_opts({Init, Range, Step}) ->
DefaultGuard = case (Step > 0) or is_function(Step) of
true -> fun(I, IterRange, _Acc) -> I =< IterRange end;
false -> fun(I, IterRange, _Acc) -> I >= IterRange end
end,
for_apply_default_opts({Init, Range, Step, DefaultGuard});
for_apply_default_opts({Init, Range, Step, Guard}) when is_function(Guard, 2) ->
for_apply_default_opts({Init, Range, Step, fun(I, IterRange, _Acc) -> Guard(I, IterRange) end});
for_apply_default_opts({Init, Range, Step, DefaultGuard}) when is_number(Step) ->
for_apply_default_opts({Init, Range, fun(I, _Acc) -> I + Step end, DefaultGuard});
for_apply_default_opts({Init, Range, Step, DefaultGuard}) when is_function(Step, 1) ->
for_apply_default_opts({Init, Range, fun(I, _Acc) -> Step(I) end, DefaultGuard});
for_apply_default_opts({_Init, _Range, _Step, _DefaultGuard} = Opts) ->
Opts.

Splitting a list in equal sized chunks in Erlang

I want to split:
[1,2,3,4,5,6,7,8]
into:
[[1,2],[3,4],[5,6],[7,8]]
It generally works great with:
[ lists:sublist(List, X, 2) || X <- lists:seq(1,length(List),2) ] .
But it is really slow this way. 10000 Elements take amazing 2.5 seconds on my netbook. I have also written a really fast recursive function, but I am simply interested: Could this list comprehension also be written in a different way, so that it is faster?
Try this:
part(List) ->
part(List, []).
part([], Acc) ->
lists:reverse(Acc);
part([H], Acc) ->
lists:reverse([[H]|Acc]);
part([H1,H2|T], Acc) ->
part(T, [[H1,H2]|Acc]).
Test in erlang-shell (I've declared this function in module part):
2> part:part([1,2,3,4,5,6,7,8]).
[[1,2],[3,4],[5,6],[7,8]]
3>
3> timer:tc(part, part, [lists:seq(1,10000)]).
{774,
[[1,2],
[3,4],
[5,6],
[7,8],
"\t\n","\v\f",
[13,14],
[15,16],
[17,18],
[19,20],
[21,22],
[23,24],
[25,26],
[27,28],
[29,30],
[31,32],
"!\"","#$","%&","'(",")*","+,","-.","/0","12","34",
[...]|...]}
Just 774 microseconds (which is ~0,8 milliseconds)
Here are two quick solutions for you that are both flexible. One is easy to read, but only slightly faster than your proposed solution. The other is quite fast, but is a bit cryptic to read. And note that both of my proposed algorithms will work for lists of anything, not just numeric ordered lists.
Here is the "easy-to-read" one. Call by n_length_chunks(List,Chunksize). For example, to get a list of chunks 2 long, call n_length_chunks(List,2). This works for chunks of any size, ie, you could call n_length_chunks(List,4) to get [[1,2,3,4],[5,6,7,8],...]
n_length_chunks([],_) -> [];
n_length_chunks(List,Len) when Len > length(List) ->
[List];
n_length_chunks(List,Len) ->
{Head,Tail} = lists:split(Len,List),
[Head | n_length_chunks(Tail,Len)].
The much faster one is here, but is definitely harder to read, and is called in the same way: n_length_chunks_fast(List,2) (I've made one change to this compared with the one above, in that it pads the end of the list with undefined if the length of the list isn't cleanly divisible by the desired chunk length.
n_length_chunks_fast(List,Len) ->
LeaderLength = case length(List) rem Len of
0 -> 0;
N -> Len - N
end,
Leader = lists:duplicate(LeaderLength,undefined),
n_length_chunks_fast(Leader ++ lists:reverse(List),[],0,Len).
n_length_chunks_fast([],Acc,_,_) -> Acc;
n_length_chunks_fast([H|T],Acc,Pos,Max) when Pos==Max ->
n_length_chunks_fast(T,[[H] | Acc],1,Max);
n_length_chunks_fast([H|T],[HAcc | TAcc],Pos,Max) ->
n_length_chunks_fast(T,[[H | HAcc] | TAcc],Pos+1,Max);
n_length_chunks_fast([H|T],[],Pos,Max) ->
n_length_chunks_fast(T,[[H]],Pos+1,Max).
Tested on my (really old) laptop:
Your proposed solution took about 3 seconds.
My slow-but-readable one was slightly faster and takes about 1.5 seconds (still quite slow)
My fast version takes about 5 milliseconds.
For completeness, Isac's solution took about 180 milliseconds on my same machine.
Edit: wow, I need to read the complete question first. Oh well I'll keep here for posterity if it helps. As far as I can tell, there's not a good way to do this using list comprehensions. Your original version is slow because each iteration of sublist needs to traverse the list each time to get to each successive X, resulting in complexity just under O(N^2).
Or with a fold:
lists:foldr(fun(E, []) -> [[E]];
(E, [H|RAcc]) when length(H) < 2 -> [[E|H]|RAcc] ;
(E, [H|RAcc]) -> [[E],H|RAcc]
end, [], List).
I want to submit slightly complicated but more flexible (and mostly faster) solution of one proposed by #Tilman
split_list(List, Max) ->
element(1, lists:foldl(fun
(E, {[Buff|Acc], C}) when C < Max ->
{[[E|Buff]|Acc], C+1};
(E, {[Buff|Acc], _}) ->
{[[E],Buff|Acc], 1};
(E, {[], _}) ->
{[[E]], 1}
end, {[], 0}, List)).
so function part can be implemented as
part(List) ->
RevList = split_list(List, 2),
lists:foldl(fun(E, Acc) ->
[lists:reverse(E)|Acc]
end, [], RevList).
update
I've added reverse in case if you want to preserve order, but as I can see it adds no more than 20% of processing time.
You could do it like this:
1> {List1, List2} = lists:partition(fun(X) -> (X rem 2) == 1 end, List).
{[1,3,5|...],[2,4,6|...]}
2> lists:zipwith(fun(X, Y) -> [X, Y] end, List1, List2).
[[1,2],[3,4],[5,6]|...]
This takes ~73 milliseconds with a 10000 elements List on my computer. The original solution takes ~900 miliseconds.
But I would go with the recursive function anyway.
I was looking for a partition function which can split a large list to small amount of workers. With lkuty's partition you might get that one worker gets almost double work than all the others. If that's not what you want, here is a version which sublist lengths differ by at most 1.
Uses PropEr for testing.
%% #doc Split List into sub-lists so sub-lists lengths differ most by 1.
%% Does not preserve order.
-spec split_many(pos_integer(), [T]) -> [[T]] when T :: term().
split_many(N, List) ->
PieceLen = length(List) div N,
lists:reverse(split_many(PieceLen, N, List, [])).
-spec split_many(pos_integer(), pos_integer(), [T], [[T]]) ->
[[T]] when T :: term().
split_many(PieceLen, N, List, Acc) when length(Acc) < N ->
{Head, Tail} = lists:split(PieceLen, List),
split_many(PieceLen, N, Tail, [Head|Acc]);
split_many(_PieceLen, _N, List, Acc) ->
% Add an Elem to each list in Acc
{Appendable, LeaveAlone} = lists:split(length(List), Acc),
Appended = [[Elem|XS] || {Elem, XS} <- lists:zip(List, Appendable)],
lists:append(Appended, LeaveAlone).
Tests:
split_many_test_() ->
[
?_assertEqual([[1,2]], elibs_lists:split_many(1, [1,2])),
?_assertEqual([[1], [2]], elibs_lists:split_many(2, [1,2])),
?_assertEqual([[1], [3,2]], elibs_lists:split_many(2, [1,2,3])),
?_assertEqual([[1], [2], [4,3]], elibs_lists:split_many(3, [1,2,3,4])),
?_assertEqual([[1,2], [5,3,4]], elibs_lists:split_many(2, [1,2,3,4,5])),
?_assert(proper:quickcheck(split_many_proper1())),
?_assert(proper:quickcheck(split_many_proper2()))
].
%% #doc Verify all elements are preserved, number of groups is correct,
%% all groups have same number of elements (+-1)
split_many_proper1() ->
?FORALL({List, Groups},
{list(), pos_integer()},
begin
Split = elibs_lists:split_many(Groups, List),
% Lengths of sub-lists
Lengths = lists:usort(lists:map(fun erlang:length/1, Split)),
length(Split) =:= Groups andalso
lists:sort(lists:append(Split)) == lists:sort(List) andalso
length(Lengths) =< 2 andalso
case Lengths of
[Min, Max] -> Max == Min + 1;
[_] -> true
end
end
).
%% #doc If number of groups is divisable by number of elements, ordering must
%% stay the same
split_many_proper2() ->
?FORALL({Groups, List},
?LET({A, B},
{integer(1, 20), integer(1, 10)},
{A, vector(A*B, term())}),
List =:= lists:append(elibs_lists:split_many(Groups, List))
).
Here is a more general answer that works with any sublist size.
1> lists:foreach(fun(N) -> io:format("~2.10.0B -> ~w~n",[N, test:partition([1,2,3,4,5,6,7,8,9,10],N)] ) end, [1,2,3,4,5,6,7,8,9,10]).
01 -> [[1],[2],[3],[4],[5],[6],[7],[8],[9],[10]]
02 -> [[1,2],[3,4],[5,6],[7,8],[9,10]]
03 -> [[1,2,3],[4,5,6],[7,8,9],[10]]
04 -> [[1,2,3,4],[5,6,7,8],[10,9]]
05 -> [[1,2,3,4,5],[6,7,8,9,10]]
06 -> [[1,2,3,4,5,6],[10,9,8,7]]
07 -> [[1,2,3,4,5,6,7],[10,9,8]]
08 -> [[1,2,3,4,5,6,7,8],[10,9]]
09 -> [[1,2,3,4,5,6,7,8,9],[10]]
10 -> [[1,2,3,4,5,6,7,8,9,10]]
And the code to achieve this is stored inside a file called test.erl:
-module(test).
-compile(export_all).
partition(List, N) ->
partition(List, 1, N, []).
partition([], _C, _N, Acc) ->
lists:reverse(Acc) ;
partition([H|T], 1, N, Acc) ->
partition(T, 2, N, [[H]|Acc]) ;
partition([H|T], C, N, [HAcc|TAcc]) when C < N ->
partition(T, C+1, N, [[H|HAcc]|TAcc]) ;
partition([H|T], C, N, [HAcc|TAcc]) when C == N ->
partition(T, 1, N, [lists:reverse([H|HAcc])|TAcc]) ;
partition(L, C, N, Acc) when C > N ->
partition(L, 1, N, Acc).
It could probably be more elegant regarding the special case where C > N. Note that C is the size of the current sublist being constructed. At start, it is 1. And then it increments until it reaches the partition size of N.
We could also use a modified version of #chops code to let the last list contains the remaining items even if its size < N :
-module(n_length_chunks_fast).
-export([n_length_chunks_fast/2]).
n_length_chunks_fast(List,Len) ->
SkipLength = case length(List) rem Len of
0 -> 0;
N -> Len - N
end,
n_length_chunks_fast(lists:reverse(List),[],SkipLength,Len).
n_length_chunks_fast([],Acc,_Pos,_Max) -> Acc;
n_length_chunks_fast([H|T],Acc,Pos,Max) when Pos==Max ->
n_length_chunks_fast(T,[[H] | Acc],1,Max);
n_length_chunks_fast([H|T],[HAcc | TAcc],Pos,Max) ->
n_length_chunks_fast(T,[[H | HAcc] | TAcc],Pos+1,Max);
n_length_chunks_fast([H|T],[],Pos,Max) ->
n_length_chunks_fast(T,[[H]],Pos+1,Max).
I've slightly altered the implementation from #JLarky to remove the guard expression, which should be slightly faster:
split_list(List, Max) ->
element(1, lists:foldl(fun
(E, {[Buff|Acc], 1}) ->
{[[E],Buff|Acc], Max};
(E, {[Buff|Acc], C}) ->
{[[E|Buff]|Acc], C-1};
(E, {[], _}) ->
{[[E]], Max}
end, {[], Max}, List)).

Fibonacci Matrix

For calculating a fibonacci sequence in O(logn) we use matrix exponential since the term
fn = fn-1 + fn-2 is linear but what is the matrix required if we want to find nth term of
fn = fn-1 + fn-2 + a0 + a1*n + a2*n^2 + ... an*n^n
which is a dependent on polynomial???
Here a0,a1,... an are constants
Look here for implementation in Erlang which uses formula
. It shows nice linear resulting behavior because in O(M(n) log n) part M(n) is exponential for big numbers. It calculates fib of one million in 2s where result has 208988 digits. The trick is that you can compute exponentiation in O(log n) multiplications using (tail) recursive formula (tail means with O(1) space when used proper compiler or rewrite to cycle):
% compute X^N
power(X, N) when is_integer(N), N >= 0 ->
power(N, X, 1).
power(0, _, Acc) ->
Acc;
power(N, X, Acc) ->
if N rem 2 =:= 1 ->
power(N - 1, X, Acc * X);
true ->
power(N div 2, X * X, Acc)
end.
where X and Acc you substitute with matrices. X will be initiated with and Acc with identity I equals to .

Resources