I am doing somthing horrible but I don't know how to make it better.
I am forming all pairwise sums of the elements of a List called SomeList, but I don't want to see duplicates ( I guess I want "all possible pairwise sums" ):
sets:to_list(sets:from_list([A+B || A <- SomeList, B <- SomeList]))
SomeList does NOT contain duplicates.
This works, but is horribly inefficient, because the original list before the set conversion is GIGANTIC.
Is there a better way to do this?
You could simply use lists:usort/1
lists:usort([X+Y || X <- L, Y <- L]).
if the chance to have duplicates is very high, then you can generate the sum using 2 loops and store the sum in an ets set (or using map, I didn't check the performance of both).
7> Inloop = fun Inloop(_,[],_) -> ok; Inloop(Store,[H|T],X) -> ets:insert(Store,{X+H}), Inloop(Store,T,X) end.
#Fun<erl_eval.42.54118792>
8> Outloop = fun Outloop(Store,[],_) -> ok; Outloop(Store,[H|T],List) -> Inloop(Store,List,H), Outloop(Store,T,List) end.
#Fun<erl_eval.42.54118792>
9> Makesum = fun(L) -> S = ets:new(temp,[set]), Outloop(S,L,L), R =ets:foldl(fun({X},Acc) -> [X|Acc] end,[],S), ets:delete(S), R end.
#Fun<erl_eval.6.54118792>
10> Makesum(lists:seq(1,10)).
[15,13,8,11,20,14,16,12,7,3,10,9,19,18,4,17,6,2,5]
11> lists:sort(Makesum(lists:seq(1,10))).
[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
12>
This module will allow you to compare times of execution when using list comprehension, sets or ets. You can of course add additional functions to this comparison:
-module(pairwise).
-export([start/2]).
start(Type, X) ->
L = lists:seq(1, X),
timer:tc(fun do/2, [Type, L]).
do(compr, L) ->
sets:to_list(sets:from_list([A+B || A <- L, B <- L]));
do(set, L) ->
F = fun(Sum, Set) -> sets:add_element(Sum, Set) end,
R = fun(Set) -> sets:to_list(Set) end,
do(L, L, sets:new(), {F, R});
do(ets, L) ->
F = fun(Sum, Tab) -> ets:insert(Tab, {Sum}), Tab end,
R = fun(Tab) ->
Fun = fun({X}, Acc) -> [X|Acc] end,
Res = ets:foldl(Fun, [], Tab),
ets:delete(Tab),
Res
end,
do(L, L, ets:new(?MODULE, []), {F, R}).
do([A|AT], [B|BT], S, {F, _} = Funs) -> do([A|AT], BT, F(A+B, S), Funs);
do([_AT], [], S, {_, R}) -> R(S);
do([_A|AT], [], S, Funs) -> do(AT, AT, S, Funs).
Results:
36> {_, Res1} = pairwise:start(compr, 20).
{282,
[16,32,3,19,35,6,22,38,9,25,12,28,15,31,2,18,34,5,21,37,8,
24,40,11,27,14,30|...]}
37> {_, Res2} = pairwise:start(set, 20).
{155,
[16,32,3,19,35,6,22,38,9,25,12,28,15,31,2,18,34,5,21,37,8,
24,40,11,27,14,30|...]}
38> {_, Res3} = pairwise:start(ets, 20).
{96,
[15,25,13,8,21,24,40,11,26,20,14,28,23,16,12,39,34,36,7,32,
35,3,33,10,9,19,18|...]}
39> R1=lists:usort(Res1), R2=lists:usort(Res2), R3=lists:usort(Res3).
[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
24,25,26,27,28,29,30|...]
40> R1 = R2 = R3.
[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
24,25,26,27,28,29,30|...]
The last line is to compare that all functions return the same result but sorted differently.
First number in each resulted tuple is the time of execution as returned from timer:tc(fun do/2, [Type, L]).. In this example it's 282 for list comprehension, 155 for sets and 96 for ets.
An effective way is to use foldl instead of lists comprehension, because in this case you nedd a state on each step
sets:to_list(
lists:foldl(fun(A, S1) ->
lists:foldl(fun(B, S2) ->
sets:add_element(A+B, S2)
end, S1, SomeListA)
end, sets:new(), SomeListB)).
This solution keeps it relatively fast and makes use of as much pre-written library code as possible.
Note that I use lists:zip/2 here rather than numeric +, only to illustrate that this approach works for any kind of non-repeating permutation of a unique list. You may only care about arithmetic, but if you want more, this can do it.
-export([permute_unique/1]).
permute_unique([]) ->
[];
permute_unique([A|Ab]) ->
lists:zip(lists:duplicate(length(Ab)+1, A), [A|Ab])
++
permute_unique(Ab).
%to sum integers, replace the lists:zip... line with
% [B+C || {B,C} <- lists:zip(lists:duplicate(length(Ab)+1, A), [A|Ab])]
%to perform normal arithmetic and yield a numeric value for each element
I am not sure what you consider gigantic - you will end up with N*(N+1)/2 total elements in the permuted list for a unique list of N original elements, so this gets big really fast.
I did some basic performance testing of this, using an Intel (Haswell) Core i7 # 4GHz with 32GB of memory, running Erlang/OTP 17 64-bit.
5001 elements in the list took between 2 and 5 seconds according to timer:tc/1.
10001 elements in the list took between 15 and 17 seconds, and required about 9GB of memory. This generates a list of 50,015,001 elements.
15001 elements in the list took between 21 and 25 seconds, and required about 19GB of memory.
20001 elements in the list took 49 seconds in one run, and peaked at about 30GB of memory, with about 200 million elements in the result. That is the limit of what I can test.
I want to convert [z,z,a,z,z,a,a,z] to [{z,2},{a,1},{z,2},{a,2},{z,1}]. How can I do it?
So, I need to accumulate previous value, counter of it and list of tuples.
I've create record
-record(acc, {previous, counter, tuples}).
Redefined
listToTuples([]) -> [];
listToTuples([H | Tail]) ->
Acc = #acc{previous=H, counter=1},
listToTuples([Tail], Acc).
But then I have some trouble
listToTuples([H | Tail], Acc) ->
case H == Acc#acc.previous of
true ->
false ->
end.
if you build up your answer (Acc) in reverse, the previous will be the head of that list.
here's how i would do it --
list_pairs(List) -> list_pairs(List, []).
list_pairs([], Acc) -> lists:reverse(Acc);
list_pairs([H|T], [{H, Count}|Acc]) -> list_pairs(T, [{H, Count+1}|Acc]);
list_pairs([H|T], Acc) -> list_pairs(T, [{H, 1}|Acc]).
(i expect someone will now follow with a one-line list comprehension version..)
I would continue on the road building the list in reverse. Notice the pattern matching over X on the first line.
F = fun(X,[{X,N}|Rest]) -> [{X,N+1}|Rest];
(X,Rest) -> [{X,1}|Rest] end.
lists:foldr(F,[],List).
I would personally use lists:foldr/3 or do it by hand with something like:
list_to_tuples([H|T]) -> list_to_tuples(T, H, 1);
list_to_tuples([]) -> [].
list_to_tuples([H|T], H, C) -> list_to_tuples(T, H, C+1);
list_to_tuples([H|T], P, C) -> [{P,C}|list_to_tuples(T, H, 1);
list_to_tuples([], P, C) -> [{P,C}].
Using two accumulators saves you unnecessarily building and pulling apart a tuple for every element in the list. I find writing it this way clearer.
The following is a erlang function. I don't understand how lists:map function is used here.
Could someone please explain?
% perform M runs with N calls to F in each run.
% For each of the M runs, determine the average time per call.
% Return, the average and standard deviation of these M results.
time_it(F, N, M) ->
G = fun() -> F(), ok end,
NN = lists:seq(1, N),
MM = lists:seq(1, M),
T = lists:map(
fun(_) ->
T0 = now(), % start timer
[ G() || _ <- NN ], % make N calls to F
1.0e-6*timer:now_diff(now(), T0)/N % average time per call
end,
MM
),
{ avg(T), std(T) }.
Thanks.
also, I don't know the proper syntax when using this function. For example, I have a dummy() function take 1 parameter. I get an error while trying to time the dummy function.
moduleName:time_it(moduleName:dummy/1, 10, 100).
the above would evaluate to illegal expression.
Actually, now with the correct syntax, the function can be invoked correctly with:
moduleName:time_it(fun moduleName:dummy/1, 10, 100).
However, it will throw a exception saying invoking dummy function without passing any parameter. I think this line is the villain, [ G() || _ <- NN ], I have no idea how to fix it.
map is used here to execute the function
T0 = now(), % start timer
[ G() || _ <- NN ], % make N calls to F
1.0e-6*timer:now_diff(now(), T0)/N % average time per call
for each element of MM. map will return a new list of the same size, where each element of the new list is the result of applying the above function to the corresponding element of MM.
You can invoke time_it like:
moduleName:time_it(fun moduleName:dummy/1, 10, 100).
The purpose of lists:map in the time_it function is just to run the inner function M times. When you see this pattern:
L = lists:seq(1,M),
lists:map(fun(_)-> Foo() end, L)
It just means call Foo() again and again M times, and return the results of each call in a list. It actually makes a list of integers [1,2,3,...N] and then calls Foo() once for each member of the list.
The author of time_it does this same trick again, because time_it needs to call the function you give it N*M times. So inside the outer loop that runs M times they use a different technique to run the inner loop N times:
L = lists:seq(1,N),
[Foo() || _ <- L]
This has exactly the same result as the code above, but this time Foo is called N times.
The reason you are having trouble using time_it with your dummy function is that time_it takes a function with 0 parameters, not 1. So you need to make a dummy function and call it like this:
dummy() ->
%% do something here you want to measure
ok.
measure_dummy() ->
time_it(fun someModule:dummy/0, 10, 100).
If you have a function moduleName:dummy/1 you can do one of the following
If you can edit time_it/3, then make it call F(constant_parameter) instead of F(). I assume this is the case.
Otherwise, call M1:time_it(fun() -> M2:dummy(constant_parameter) end, N, M).
dummy will not be called directly, but only by F inside time_it.
results(N, F) when N >= 0 -> results(N, F, []).
results(0, _, Acc) -> lists:reverse(Acc);
results(N, F, Acc) -> results(N-1, F, [F() | Acc]).
repeat(0, F) -> ok;
repeat(N, F) when N > 0 ->
F(),
repeat(N-1, F).
With these:
T = results(M, fun () ->
T0 = now(),
repeat(N, G),
1.0e-6 * timer:now_diff(now(), T0)/N
end)
Make sense, now?