Suppose I have a map like this
A = #{a=>1,b=>2,c=>3}.
I want to create a function which converts A to a list of tuples of key-value pairs.
list = [{a,1},{b,2},{c,3}]
maps:to_list/1 does exactly this:
1> maps:to_list(#{a=>1,b=>2,c=>3}).
[{a,1},{b,2},{c,3}]
You can use maps:fold/3 for loop map items. Let's say you need just convert a map, then you can use something like:
1> A = #{a=>1,b=>2,c=>3}.
2> maps:fold(
fun(K, V, Acc) ->
[{K, V} | Acc]
end,
[], A).
[{c,3},{b,2},{a,1}]
For case if need to do the same for nested maps, this example can be modify like:
1> A = #{a => 1, b => 2, c => 3, d => #{a => 1, b => #{a => 1}}},
2> Nested =
fun F(K, V = #{}, Acc) -> [{K, maps:fold(F, [], V)} | Acc];
F(K, V, Acc) -> [{K, V} | Acc]
end,
3> maps:fold(Nested, [], A).
[{d,[{b,[{a,1}]},{a,1}]},{c,3},{b,2},{a,1}]
I'm looking to optimize my solution for the maximum Collatz sequence problem in Erlang. Right now I've tried using ETS, and the following solution uses maps, but I'm getting worse performance than I feel I should. Is there perhaps some optimization I could do to improve it?
-module(collatzMaps).
-export([start/2, s/4]).
collatz(0, Map) ->
{0, Map};
collatz(M, Map) ->
Exists = maps:is_key(M, Map),
case Exists of
false ->
case M rem 2 == 0 of
true ->
Result = collatz(M div 2, Map),
Val = (1 + element(1, Result)),
Map1 = maps:put(M, Val, element(2, Result)),
{maps:get(M, Map1), Map1};
false ->
Result = collatz((3 * M + 1), Map),
Val = (1 + element(1, Result)),
Map2 = maps:put(M, Val, element(2, Result)),
{maps:get(M, Map2), Map2}
end;
true ->
{maps:get(M, Map), Map}
end.
s(N, M, Max, Map) ->
if
N =< M ->
Result = collatz(N, Map),
if
element(1, Result) > Max ->
NextMax = element(1, Result),
MapNext = element(2, Result),
s(N + 1, M, NextMax, MapNext);
true ->
MapNext = element(2, Result),
s(N + 1, M, Max, MapNext)
end;
true ->
Max
end.
start(N, M)->
statistics(runtime),
statistics(wall_clock),
Map = maps:new(),
Map1 = maps:put(1, 1, Map),
G = s(N, M, 0, Map1),
{_, Time2} = statistics(wall_clock),
U2 = Time2 / 1000,
io:format("~p seconds~n", [U2]),
G.
Well, first let's tweak up invocation which will allow us to make some simple statistics and compare different approaches
-export([start/2, max_collatz/2]).
...
max_collatz(N, M) ->
Map = maps:new(),
Map1 = maps:put(1, 1, Map),
s(N, M, 0, Map1).
start(N, M)->
{T, Result} = timer:tc( fun() -> max_collatz(N, M) end),
io:format("~p seconds~n", [T / 1000000]),
Result.
So let's write it more Erlang idiomatic way
-module(collatz).
-export([start/2, max_collatz/2]).
collatz_next(N) when N rem 2 =:= 0 ->
N div 2;
collatz_next(N) ->
3 * N + 1.
collatz_length(N, Map) ->
case Map of
#{N := L} -> {L, Map};
_ ->
{L, Map2} = collatz_length(collatz_next(N), Map),
{L + 1, Map2#{N => L + 1}}
end.
max_collatz(N, M) ->
Map = lists:foldl(fun(X, Map) -> {_, Map2} = collatz_length(X, Map), Map2 end,
#{1 => 1}, lists:seq(N, M)),
lists:max(maps:values(Map)).
start(N, M) ->
{T, Result} = timer:tc(fun() -> max_collatz(N, M) end),
io:format("~p seconds~n", [T / 1000000]),
Result.
Then we can compare speed using for example eministat.
Clone in
git clone https://github.com/jlouis/eministat.git
cd eministat
make
If you run in a problem like
DEPEND eministat.d
ERLC eministat.erl eministat_analysis.erl eministat_ds.erl eministat_plot.erl eministat_report.erl eministat_resample.erl eministat_ts.erl
compile: warnings being treated as errors
src/eministat_resample.erl:8: export_all flag enabled - all functions will be exported
erlang.mk:4940: recipe for target 'ebin/eministat.app' failed
make[1]: *** [ebin/eministat.app] Error 1
erlang.mk:4758: recipe for target 'app' failed
make: *** [app] Error 2
You can fix it
diff --git src/eministat_resample.erl src/eministat_resample.erl
index 1adf401..0887b2c 100644
--- src/eministat_resample.erl
+++ src/eministat_resample.erl
## -5,7 +5,7 ##
-include("eministat.hrl").
-export([resample/3, bootstrap_bca/3]).
--compile(export_all).
+-compile([nowarn_export_all, export_all]).
%% #doc resample/3 is the main resampler of eministat
%% #end
So then run it
$ erl -pa eministat/ebin/
Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Eshell V10.1 (abort with ^G)
1> c(collatzMaps), c(collatz).
{ok,collatz}
2> eministat:x(95.0, eministat:s(orig, fun() -> collatzMaps:max_collatz(1, 100000) end, 30), eministat:s(new, fun() -> collatz:max_collatz(1, 100000) end, 30)).
x orig
+ new
+--------------------------------------------------------------------------+
|+ ++++++++ +++++ * + +x+**+xxxx**x xxx xx+x xxx *x x + x x|
| + + + x x xx x |
| + |
| |_______M___A__________| |
| |________M_____A______________| |
+--------------------------------------------------------------------------+
Dataset: x N=30 CI=95.0000
Statistic Value [ Bias] (Bootstrapped LB‥UB)
Min: 1.76982e+5
1st Qu. 1.81610e+5
Median: 1.82954e+5
3rd Qu. 1.87030e+5
Max: 1.94944e+5
Average: 1.84280e+5 [ 8.00350] ( 1.82971e+5 ‥ 1.85749e+5)
Std. Dev: 3999.87 [ -102.524] ( 3128.74 ‥ 5431.13)
Outliers: 0/0 = 0 (μ=1.84288e+5, σ=3897.35)
Outlier variance: 3.22222e-2 (slight)
------
Dataset: + N=30 CI=95.0000
Statistic Value [ Bias] (Bootstrapped LB‥UB)
Min: 1.69179e+5
1st Qu. 1.72501e+5
Median: 1.74614e+5
3rd Qu. 1.79850e+5
Max: 1.90638e+5
Average: 1.76517e+5 [ 3.11862] ( 1.74847e+5 ‥ 1.78679e+5)
Std. Dev: 5343.46 [ -147.802] ( 4072.99 ‥ 7072.53)
Outliers: 0/0 = 0 (μ=1.76520e+5, σ=5195.66)
Outlier variance: 9.43164e-2 (slight)
Difference at 95.0% confidence
-7762.60 ± 2439.69
-4.21240% ± 1.32391%
(Student's t, pooled s = 4719.72)
------
ok
So it seems like 4% faster now which is not much. First, we can inline collatz_next/1 which is basically what you have in your collatz/2 function. I like to be specific so I put between -export and a first function
-compile({inline, [collatz_next/1]}).
It have very little effect
Difference at 95.0% confidence
-9895.27 ± 5524.91
-5.24520% ± 2.92860%
(Student's t, pooled s = 1.06882e+4)
Then we can try roll out lists:fold/2, lists:seq/2 and lists:max/1 as in your s/4 function but let's do it more idiomatic way.
max_collatz(N, M) ->
max_collatz(N, M, 1, #{1 => 1}).
max_collatz(M, M, Max, _) -> Max;
max_collatz(N, M, Max, Map) ->
case collatz_length(N + 1, Map) of
{L, Map2} when L > Max ->
max_collatz(N + 1, M, L, Map2);
{_, Map2} ->
max_collatz(N + 1, M, Max, Map2)
end.
Well it's better but still not much
Difference at 95.0% confidence
-1.78775e+4 ± 1980.35
-9.66832% ± 1.07099%
Now, when we removed all external code calls it's worth to try native compiling (external function call usually ruins any native compilation benefit). We could also add little type hint for HiPE but it seems to have barely any effect (it is usually worth to try for floating point arithmetic which is not this case and heavy usage of maps is probably issuing problem here as well).
max_collatz(N, M) when N < M, is_integer(N), is_integer(M) ->
max_collatz(N, M, 1, #{1 => 1}).
Not much better
c(collatz, [native]).
...
Difference at 95.0% confidence
-2.26703e+4 ± 2651.32
-12.1721% ± 1.42354%
(Student's t, pooled s = 5129.13)
So its time try it dirty. Process dictionary is not the recommended place to store your data but if it is inside special process it is an acceptable solution.
collatz_length(N) ->
case get(N) of
undefined ->
L = collatz_length(collatz_next(N)),
put(N, L + 1),
L + 1;
L -> L
end.
max_collatz(N, M) when N < M, is_integer(N), is_integer(M) ->
P = self(),
W = spawn_link(fun() ->
put(1, 1),
P ! {self(), max_collatz(N, M, 1)}
end),
receive {W, Max} -> Max end.
max_collatz(M, M, Max) -> Max;
max_collatz(N, M, Max) ->
case collatz_length(N + 1) of
L when L > Max ->
max_collatz(N + 1, M, L);
_ ->
max_collatz(N + 1, M, Max)
end.
Yes, its dirty but working solution and its worth it (even without native)
Difference at 95.0% confidence
-1.98173e+5 ± 5450.92
-80.9384% ± 2.22628%
(Student's t, pooled s = 1.05451e+4)
So here we are from 3.6s down to 0.93s using some dirty tricks but anyway, if you would do this sort of tasks, you would probably use NIF written in C. It is not a type of task where Erlang shine.
> collatzMaps:start(1, 1000000).
3.576669 seconds
525
> collatz:start(1, 1000000).
0.931186 seconds
525
I am a bit struggling with extracting fields from a binary message. Raw message looks like the following:
<<1,0,97,98,99,100,0,0,0,3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>
I know the order, type and static sizes of fields, some have arbitary sizes thought, so I am trying to do something like the following:
newobj(Data) ->
io:fwrite("NewObj RAW ~p~n",[Data]),
NewObj = {obj,rest(uint16(string(uint16({[],Data},id),type),parent),unparsed)},
io:fwrite("NewObj ~p~n",[NewObj]),
NewObj.
uint16/2, string/2, and rest/2 are actually extraction functions and look like this:
uint16(ListData, Name) ->
{List, Data} = ListData,
case Data of
<<Int:2/little-unsigned-unit:8, Rest/binary>> ->
{List ++ [{Name,Int}], Rest};
<<Int:2/little-unsigned-unit:8>> ->
List ++ [{Name,Int}]
end.
string(ListData, Name) ->
{List, Data} = ListData,
Split = binary:split(Data,<<0>>),
String = lists:nth(1, Split),
if
length(Split) == 2 ->
{List ++ [{Name, String}], lists:nth(2, Split)};
true ->
List ++ [{Name, String}]
end.
rest(ListData, Name) ->
{List, Data} = ListData,
List ++ [{Name, Data}].
This works and looks like:
NewObj RAW <<1,0,97,98,99,100,0,0,0,3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>
NewObj {obj,[{id,1},
{type,<<"abcd">>},
{parent,0},
{unparsed,<<3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>}]}
The reason for this question though is that passing {List, Data} as ListData and then splitting it within the function with {List, Data} = ListData feels clumsy - so is there a better way? I think I can't use static matching because "unparsed" and "type" parts are of arbitary length, so it's not possible to define their respective sizes.
Thanks!
---------------Update-----------------
Trying to take comments below into account - code now looks like the following:
newobj(Data) ->
io:fwrite("NewObj RAW ~p~n",[Data]),
NewObj = {obj,field(
field(
field({[], Data},id,fun uint16/1),
type, fun string/1),
unparsed,fun rest/1)},
io:fwrite("NewObj ~p~n",[NewObj]).
field({List, Data}, Name, Func) ->
{Value,Size} = Func(Data),
case Data of
<<_:Size/binary-unit:8>> ->
[{Name,Value}|List];
<<_:Size/binary-unit:8, Rest/binary>> ->
{[{Name,Value}|List], Rest}
end.
uint16(Data) ->
case Data of
<<UInt16:2/little-unsigned-unit:8, _/binary>> ->
{UInt16,2};
<<UInt16:2/little-unsigned-unit:8>> ->
{UInt16,2}
end.
string(Data) ->
Split = binary:split(Data,<<0>>),
case Split of
[String, Rest] ->
{String,byte_size(String)+1};
[String] ->
{String,byte_size(String)+1}
end.
rest(Data) ->
{Data,byte_size(Data)}.
The code is non idiomatic and some pieces cannot compile as is :-) Here are some comments:
The newobj/1 function makes a reference to a NewObj variable that is unbound. Probably the real code is something like NewObj = {obj,rest(... ?
The code uses list append (++) multiple times. This should be avoided if possible because it performs too much memory copies. The idiomatic way is to add to the head of the list as many times as needed (that is: L2 = [NewThing | L1]) and call lists:reverse/1 at the very end. See any Erlang book or the free Learn Yourself some Erlang for the details.
In a similar vein, lists:nth/2 should be avoided and replaced by pattern matching or a different way to construct the list or parse the binary
Dogbert's suggestion about doing the pattern matching directly in the function argument is a good idiomatic approach and allows to remove some lines from the code.
As last suggestion regarding the approach to debug, consider replacing the fwrite functions with proper unit tests.
Hope this gives some hints for what to look at. Feel free to append to your question the code changes, we can proceed from there.
EDIT
It's looking better. Let's see if we can simplify. Please note that we are doing the work backwards, because we are adding tests after the production code has been written, instead of doing test-driven development.
Step 1: add test.
I also reversed the order of the list because it looks more natural.
-include_lib("eunit/include/eunit.hrl").
happy_input_test() ->
Rest = <<3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 32, 3, 0, 0, 88, 2, 0, 0>>,
Input = <<1, 0,
97, 98, 99, 100, 0,
0, 0,
Rest/binary>>,
Expected = {obj, [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, Rest}]},
?assertEqual(Expected, binparse:newobj(Input)).
We can run this, among other ways, with rebar3 eunit (see the rebar3 documentation; I suggest to start with rebar3 new lib mylib to create a skeleton).
Step 2: the absolute minimum
Your description is not enough to understand which fields are mandatory and which are optional and whether there is always something more after the obj.
In the simplest possible case, all your code can be reduced to:
newobj(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
[Type, Rest2] = binary:split(Rest, <<0>>),
<<Parent:16/little-unsigned, Rest3/binary>> = Rest2,
{obj, [{id, Id}, {type, Type}, {parent, Parent}, {unparsed, Rest3}]}.
Quite compact :-)
I find the encoding of the string very bizarre: a binary encoding where the string is NUL-terminated (so forces to walk the binary) instead of being encoded with, say, 2 or 4 bytes to represent the length and then the string itself.
Step 3: input validation
Since we are parsing a binary, this is probably coming from the outside of our system. As such, the let it crash philosophy doesn't apply and we have to perform full input validation.
I make the assumption that all fields are mandatory except unparsed, that can be empty.
missing_unparsed_is_ok_test() ->
Input = <<1, 0,
97, 98, 99, 100, 0,
0, 0>>,
Expected = {obj, [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, <<>>}]},
?assertEqual(Expected, binparse:newobj(Input)).
The simple implementation above passes it.
Step 4: malformed parent
We add the tests and we make a API decision: the function will return an error tuple.
missing_parent_is_error_test() ->
Input = <<1, 0,
97, 98, 99, 100, 0>>,
?assertEqual({error, bad_parent}, binparse:newobj(Input)).
malformed_parent_is_error_test() ->
Input = <<1, 0,
97, 98, 99, 100, 0,
0>>,
?assertEqual({error, bad_parent}, binparse:newobj(Input)).
We change the implementation to pass the tests:
newobj(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
[Type, Rest2] = binary:split(Rest, <<0>>),
case Rest2 of
<<Parent:16/little-unsigned, Rest3/binary>> ->
{obj, [{id, Id}, {type, Type}, {parent, Parent}, {unparsed, Rest3}]};
Rest2 ->
{error, bad_parent}
end.
Step 5: malformed type
The new tests:
missing_type_is_error_test() ->
Input = <<1, 0>>,
?assertEqual({error, bad_type}, binparse:newobj(Input)).
malformed_type_is_error_test() ->
Input = <<1, 0,
97, 98, 99, 100>>,
?assertEqual({error, bad_type}, binparse:newobj(Input)).
We could be tempted to change the implementation as follows:
newobj(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
case binary:split(Rest, <<0>>) of
[Type, Rest2] ->
case Rest2 of
<<Parent:16/little-unsigned, Rest3/binary>> ->
{obj, [
{id, Id}, {type, Type},
{parent, Parent}, {unparsed, Rest3}
]};
Rest2 ->
{error, bad_parent}
end;
[Rest] -> {error, bad_type}
end.
Which is an unreadable mess. Just adding functions doesn't help us:
newobj(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
case parse_type(Rest) of
{ok, {Type, Rest2}} ->
case parse_parent(Rest2) of
{ok, Parent, Rest3} ->
{obj, [
{id, Id}, {type, Type},
{parent, Parent}, {unparsed, Rest3}
]};
{error, Reason} -> {error, Reason}
end;
{error, Reason} -> {error, Reason}
end.
parse_type(Bin) ->
case binary:split(Bin, <<0>>) of
[Type, Rest] -> {ok, {Type, Rest}};
[Bin] -> {error, bad_type}
end.
parse_parent(Bin) ->
case Bin of
<<Parent:16/little-unsigned, Rest/binary>> -> {ok, Parent, Rest};
Bin -> {error, bad_parent}
end.
This is a classic problem in Erlang with nested conditionals.
Step 6: regaining sanity
Here is my approach, quite generic so applicable (I think) to many domains. The overall idea is taken from backtracking, as explained in http://rvirding.blogspot.com/2009/03/backtracking-in-erlang-part-1-control.html
We create one function per parse step and pass them, as a list, to call_while_ok/3:
newobj(Bin) ->
Parsers = [fun parse_id/1,
fun parse_type/1,
fun parse_parent/1,
fun(X) -> {ok, {unparsed, X}, <<>>} end
],
case call_while_ok(Parsers, Bin, []) of
{error, Reason} -> {error, Reason};
PropList -> {obj, PropList}
end.
Function call_while_ok/3 is somehow related to lists:foldl and lists:filter:
call_while_ok([F], Seed, Acc) ->
case F(Seed) of
{ok, Value, _NextSeed} -> lists:reverse([Value | Acc]);
{error, Reason} -> {error, Reason}
end;
call_while_ok([F | Fs], Seed, Acc) ->
case F(Seed) of
{ok, Value, NextSeed} -> call_while_ok(Fs, NextSeed, [Value | Acc]);
{error, Reason} -> {error, Reason}
end.
And here are the parsing functions. Note that their signature is always the same:
parse_id(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
{ok, {id, Id}, Rest}.
parse_type(Bin) ->
case binary:split(Bin, <<0>>) of
[Type, Rest] -> {ok, {type, Type}, Rest};
[Bin] -> {error, bad_type}
end.
parse_parent(Bin) ->
case Bin of
<<Parent:16/little-unsigned, Rest/binary>> ->
{ok, {parent, Parent}, Rest};
Bin -> {error, bad_parent}
end.
Step 7: homework
The list [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, Rest}] is a proplist (see Erlang documentation), which predates Erlang maps.
Have a look at the documentation for maps and see if it makes sense to return a map instead.
This question already has answers here:
is_proplist in erlang?
(2 answers)
Closed 8 years ago.
I have a data in format of tuple :
{data, [[{a, 2}, {b, 3}], [{x, 1}, {v,2}], [1,2,3,4], "hello world", 1111]}
Since,
{data, [[{a, 2}, {b, 3}], [{x, 1}, {v,2}], [1,2,3,4], "hello world", 1111]} = {data, L}.
And need to check if the element of L is a proplist, like here : [{a, 2}, {b, 3}] and [{x, 1}, {v,2}] are proplists.
Is there any function in erlang to check whether the list is proplist?
is_proplist(List) should return true or false
The function F1 below checks for proplists (assuming that a single atom 'a' is equivalent to {'a',true})
1> F = fun(X) when is_atom(X) -> true; ({X,_}) when is_atom(X) -> true; (_) -> false end.
#Fun<erl_eval.6.80484245>
2> L = [[{a, 2}, {b, 3}], [{x, 1}, {v,2}], [1,2,3,4], "hello world", 1111].
[[{a,2},{b,3}],[{x,1},{v,2}],[1,2,3,4],"hello world",1111]
3> F1 = fun(X) when is_list(X) -> lists:all(F,X); (_) -> false end.
#Fun<erl_eval.6.80484245>
4> [X || X <- L, F1(X)].
[[{a,2},{b,3}],[{x,1},{v,2}]]
5>