Related
I think namespacing of maps (at least in R19) is pretty weird. Consider the example:
14> M = #{a => 2, b => 3, c => 4}.
#{a => 2,b => 3,c => 4}
15> M.
#{a => 2,b => 3,c => 4}
16> map_size(M).
3
17> maps:map_size(M).
** exception error: undefined function maps:map_size/1
18> to_list(M).
** exception error: undefined shell command to_list/1
19> maps:to_list(M).
[{a,2},{b,3},{c,4}]
So, map_size is available in default namespace but not in maps:. However, to_list/1 exhibits opposite behavior. I haven't tried other functions, but even these results are surprising.
Am I missing some important undercurrent here or is this just an example of carelessness in language design?
I see some logic to this. The map_size/1 function is also available as maps:size/1, where both names contain the information you need: it takes a map, and returns the size. On the other hand, the name to_list doesn't say what you're converting from. There are several to_list functions in the default namespace already:
atom_to_list
binary_to_list
float_to_list
integer_to_list
pid_to_list
tuple_to_list
So the inconsistency here is that while "size" is available as map_size/1 and maps:size/1, the function map_to_list is missing. As Dogbert notes in the comments, this is presumably because map_size is available in guard tests, and thus deserves a special place. (I seem to remember that there are functions in other modules that are available in guard tests, but my memory might be deceiving me.)
I'm reading Programming Erlang, in Chapter 5 of the book it says:
Records are just tuples in disguise, so they have the same storage and performance
characteristics as tuples. Maps use more storage than tuples and have
slower lookup properties.
In languages I've learned before, this is not the case. Maps are usually implemented as a Hash Table, so the lookup time complexity is O(1); Records (Tuples with names) are usually implemented as an immutable List, and the lookup time complexity is O(N).
What's different in the implementation of these data structures in Erlang?
There's no real practical performance difference between record lookup and map lookup for small numbers of fields. For large numbers of fields, though, there is, because record information is known at compile time while map keys need not be, so maps use a different lookup mechanism than records. But records and maps are not intended as interchangeable replacements, and so comparing them for use cases involving anything more than a small number of fields is pointless IMO; if you know the fields you need at compile time, use records, but if you don't, use maps or another similar mechanism. Because of this, the following focuses only on the differences in performance of looking up one record field and one map key.
Let's look at the assembler for two functions, one that accesses a record field and one that accesses a map key. Here are the functions:
-record(foo, {f}).
r(#foo{f=X}) ->
X.
m(#{f := X}) ->
X.
Both use pattern matching to extract a value from the given type instance.
Here's the assembly for r/1:
{function, r, 1, 2}.
{label,1}.
{line,[{location,"f2.erl",6}]}.
{func_info,{atom,f2},{atom,r},1}.
{label,2}.
{test,is_tuple,{f,1},[{x,0}]}.
{test,test_arity,{f,1},[{x,0},2]}.
{get_tuple_element,{x,0},0,{x,1}}.
{get_tuple_element,{x,0},1,{x,2}}.
{test,is_eq_exact,{f,1},[{x,1},{atom,foo}]}.
{move,{x,2},{x,0}}.
return.
The interesting part here starts under {label,2}. The code verifies that the argument is a tuple, then verifies the tuple's arity, and extracts two elements from it. After verifying that the first element of the tuple is equal to the atom foo, it returns the value of the second element, which is record field f.
Now let's look at the assembly of the m/1 function:
{function, m, 1, 4}.
{label,3}.
{line,[{location,"f2.erl",9}]}.
{func_info,{atom,f2},{atom,m},1}.
{label,4}.
{test,is_map,{f,3},[{x,0}]}.
{get_map_elements,{f,3},{x,0},{list,[{atom,f},{x,0}]}}.
return.
This code verifies that the argument is a map, then extracts the value associated with map key f.
The costs of each function come down to the costs of the assembly instructions. The record function has more instructions but it's likely they're less expensive than the instructions in the map function because all the record information is known at compile time. This is especially true as the key count for the map increases, since that means the get_map_elements call has to potentially wade through more map data to find what it's looking for.
We can write functions to call these accessors numerous times, then time the new functions. Here are two sets of recursive functions that call the accessors N times:
call_r(N) ->
call_r(#foo{f=1},N).
call_r(_,0) ->
ok;
call_r(F,N) ->
1 = r(F),
call_r(F,N-1).
call_m(N) ->
call_m(#{f => 1},N).
call_m(_,0) ->
ok;
call_m(M,N) ->
1 = m(M),
call_m(M,N-1).
We can call these with timer:tc/3 to check the execution time for each function. Let's call each one ten million times, but do so 50 times and take the average execution time. First, the record function:
1> lists:sum([element(1,timer:tc(f2,call_r,[10000000])) || _ <- lists:seq(1,50)])/50.
237559.02
This means calling the function ten million times took an average of 238ms. Now, the map function:
2> lists:sum([element(1,timer:tc(f2,call_m,[10000000])) || _ <- lists:seq(1,50)])/50.
235871.94
Calling the map function ten million times averaged 236ms per call. Your mileage will vary of course, as did mine; I observed that running each multiple times sometimes resulted in the record function being faster and sometimes the map function being faster, but neither was ever faster by a large margin. I'd encourage you to do your own measurements, but it seems that there's virtually no performance difference between the two, at least for accessing a single field via pattern matching. As the number of fields increases, though, the difference in performance becomes more apparent: for 10 fields, maps are slower by about 0.5%, and for 50 fields, maps are slower by about 50%. But as I stated up front, I see this as being immaterial, since if you're trying to use records and maps interchangeably you're doing it wrong.
UPDATE: based on the conversation in the comments I clarified the answer to discuss performance differences as the number of fields/keys increases, and to point out that records and maps are not meant to be interchangeable.
UPDATE: for Erlang/OTP 24 the Erlang Efficiency Guide was augmented with a chapter on maps that's worth reading for detailed answers to this question.
I have different results when repeat the test with Erlang/OTP 22 [erts-10.6].
Disassembled code is different for r/1:
The record lookup is 1.5+ times faster.
{function, r, 1, 2}.
{label,1}.
{line,[{location,"f2.erl",9}]}.
{func_info,{atom,f2},{atom,r},1}.
{label,2}.
{test,is_tagged_tuple,{f,1},[{x,0},2,{atom,foo}]}.
{get_tuple_element,{x,0},1,{x,0}}.
return.
{function, m, 1, 4}.
{label,3}.
{line,[{location,"f2.erl",12}]}.
{func_info,{atom,f2},{atom,m},1}.
{label,4}.
{test,is_map,{f,3},[{x,0}]}.
{get_map_elements,{f,3},{x,0},{list,[{atom,f},{x,1}]}}.
{move,{x,1},{x,0}}.
return.
9> lists:sum([element(1,timer:tc(f2,call_r,[10000000])) || _ <- lists:seq(1,50)])/50.
234309.04
10> lists:sum([element(1,timer:tc(f2,call_m,[10000000])) || _ <- lists:seq(1,50)])/50.
341411.9
After I declared -compile({inline, [r/1, m/1]}).
13> lists:sum([element(1,timer:tc(f2,call_r,[10000000])) || _ <- lists:seq(1,50)])/50.
199978.9
14> lists:sum([element(1,timer:tc(f2,call_m,[10000000])) || _ <- lists:seq(1,50)])/50.
356002.48
I compared record with 10 elements to map of the same size. In this case records proved to be more than 2 times faster.
-module(f22).
-compile({inline, [r/1, m/1]}).
-export([call_r/1, call_r/2, call_m/1, call_m/2]).
-define(I, '2').
-define(V, 2 ).
-record(foo, {
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'0'
}).
r(#foo{?I = X}) ->
X.
m(#{?I := X}) ->
X.
call_r(N) ->
call_r(#foo{
'1' = 1,
'2' = 2,
'3' = 3,
'4' = 4,
'5' = 5,
'6' = 6,
'7' = 7,
'8' = 8,
'9' = 9,
'0' = 0
}, N).
call_r(_,0) ->
ok;
call_r(F,N) ->
?V = r(F),
call_r(F,N-1).
call_m(N) ->
call_m(#{
'1' => 1,
'2' => 2,
'3' => 3,
'4' => 4,
'5' => 5,
'6' => 6,
'7' => 7,
'8' => 8,
'9' => 9,
'0' => 0
}, N).
call_m(_,0) ->
ok;
call_m(F,N) ->
?V = m(F),
call_m(F,N-1).
% lists:sum([element(1,timer:tc(f22,call_r,[10000000])) || _ <- lists:seq(1,50)])/50.
% 229777.3
% lists:sum([element(1,timer:tc(f22,call_m,[10000000])) || _ <- lists:seq(1,50)])/50.
% 395897.68
% After declaring
% -compile({inline, [r/1, m/1]}).
% lists:sum([element(1,timer:tc(f22,call_r,[10000000])) || _ <- lists:seq(1,50)])/50.
% 130859.98
% lists:sum([element(1,timer:tc(f22,call_m,[10000000])) || _ <- lists:seq(1,50)])/50.
% 306490.6
% 306490.6 / 130859.98 .
% 2.34212629407401
i tried to implement binary_search in erlang :
binary_search(X , List) ->
case {is_number(x) , is_list(List)} of
{false , false} -> {error};
{false , true} -> {error} ;
{true , false} -> {error} ;
{true , true} ->
Length = length(List) ,
case Length of
0 -> {false};
1 -> case lists:member(X , List) of
true -> {true};
false -> {false}
end ;
_ ->
Middle = (Length + 1) div 2 ,
case X >= Middle of
true -> binary_search(X , lists:sublist(List , Middle , Length));
false -> binary_search(X , lists:sublist(List , 1 , Middle))
end
end
end .
However when i try to compile it , i get the following error : "this clause cannot match because of different types/sizes" in the two lines :
{true , false} -> {error} ;
{true , true} ->
is_number(x) will always return false since you made a typo: x instead of X, an atom instead of a variable.
BTW, I don't know what you are experiencing, but the whole code can be written as:
binary_search(X , [_|_] = List) when is_number(X) ->
{lists:member(X,List)};
binary_search(_,_) -> {error}.
Context: The OP's post appears to be a learning example -- an attempt to understand binary search in Erlang -- and is treated as one below (hence the calls to io:format/2 each iteration of the inner function). In production lists:member/2 should be used as noted by Steve Vinoski in a comment below, or lists:member/2 guarded by a function head as in Pascal's answer. What follows is a manual implementation of binary search.
Pascal is correct about the typo, but this code has more fundamental problems. Instead of just finding the typo let's see if we can obviate the need for this nested case checking entirely.
(The code as written above won't work anyway because X should not represent the value of an index, but rather the value that is held at that index, so Middle will likely never match X. Also, there is another issue: you don't cover all the base cases (cases in which you should stop recursing). So the inner function below covers them all up front as matches within the function head, so it is more obvious how the search works. Note the Middle + 1 when X > Value, by the way; contemplate why this is necessary.)
Two main notes on Erlang style
First: If you receive the wrong sort of data, just crash, don't return an error. With that in mind, consider using a guard.
Second: If you find yourself doing lots of cases, you can usually simplify your life by making them named functions. This gives you two advantages:
A much better crash report than you will get within nested case expressions.
A named, pure function can be tested and even formally verified rather easily if it is small enough -- which is also pretty cool. (As a side note, the religion of testing tests my patience and sanity at times, but when you have pure functions you actually can test at least those parts of your program -- so distilling out as much of this sort of thing as possible is a big win.)
Below I do both, and this should obviate the issue you ran into as well as make things a bit easier to read/sort through mentally:
%% Don't return errors, just crash.
%% Only check the data on entry.
%% Guarantee the data is sorted, as this is fundamental to binary search.
binary_search(X, List)
when is_number(X),
is_list(List) ->
bs(X, lists:sort(List)).
%% Get all of our obvious base cases out of the way as matches.
%% Note the lack of type checking; its already been done.
bs(_, []) -> false;
bs(X, [X]) -> true;
bs(X, [_]) -> false;
bs(X, List) ->
ok = io:format("bs(~p, ~p)~n", [X, List]),
Length = length(List),
Middle = (Length + 1) div 2,
Value = lists:nth(Middle, List),
% This is one of those rare times I find an 'if' to be more
% clear in meaning than a 'case'.
if
X == Value -> true;
X > Value -> bs(X, lists:sublist(List, Middle + 1, Length));
X < Value -> bs(X, lists:sublist(List, 1, Middle))
end.
Disclaimer: I kept this because some things may be useful to others, however, it does not solve what I had initially tried to do.
Right now, I'm trying to solve the following:
Given something like {a, B, {c, D}} I want to scan through Erlang forms given to parse_transform/2 and find each use of the send operator (!). Then I want to check the message being sent and determine whether it would fit the pattern {a, B, {c, D}}.
Therefore, consider finding the following form:
{op,17,'!',
{var,17,'Pid'},
{tuple,17,[{atom,17,a},{integer,17,5},{var,17,'SomeVar'}]}}]}]}
Since the message being sent is:
{tuple,17,[{atom,17,a},{integer,17,5},{var,17,'SomeVar'}]}
which is an encoding of {a, 5, SomeVar}, this would match the original pattern of {a, B, {c, D}}.
I'm not exactly sure how I'm going to go about this but do you know of any API functions which could help?
Turning the given {a, B, {c, D}} into a form is possible by first substituting the variables with something, e.g. strings (and taking a note of this), else they'll be unbound, and then using:
> erl_syntax:revert(erl_syntax:abstract({a, "B", {c, "D"}})).
{tuple,0,
[{atom,0,a},
{string,0,"B"},
{tuple,0,[{atom,0,c},{string,0,"D"}]}]}
I was thinking that after getting them in the same format like this, I could analyze them together:
> erl_syntax:type({tuple,0,[{atom,0,a},{string,0,"B"},{tuple,0,[{atom,0,c},string,0,"D"}]}]}).
tuple
%% check whether send argument is also a tuple.
%% then, since it's a tuple, use erl_syntax:tuple_elements/1 and keep comparing in this way, matching anything when you come across a string which was a variable...
I think I'll end up missing something out (and for example recognizing some things but not others ... even though they should have matched).
Are there any API functions which I could use to ease this task? And as for a pattern match test operator or something along those lines, that does not exist right? (i.e. only suggested here: http://erlang.org/pipermail/erlang-questions/2007-December/031449.html).
Edit: (Explaining things from the beginning this time)
Using erl_types as Daniel suggests below is probably doable if you play around with the erl_type() returned by t_from_term/1 i.e. t_from_term/1 takes a term with no free variables so you'd have to stay changing something like {a, B, {c, D}} into {a, '_', {c, '_'}} (i.e. fill the variables), use t_from_term/1 and then go through the returned data structure and change the '_' atoms to variables using the module's t_var/1 or something.
Before explaining how I ended up going about it, let me state the problem a bit better.
Problem
I'm working on a pet project (ErlAOP extension) which I'll be hosting on SourceForge when ready. Basically, another project already exists (ErlAOP) through which one can inject code before/after/around/etc... function calls (see doc if interested).
I wanted to extend this to support injection of code at the send/receive level (because of another project). I've already done this but before hosting the project, I'd like to make some improvements.
Currently, my implementation simply finds each use of the send operator or receive expression and injects a function before/after/around (receive expressions have a little gotcha because of tail recursion). Let's call this function dmfun (dynamic match function).
The user will be specifying that when a message of the form e.g. {a, B, {c, D}} is being sent, then the function do_something/1 should be evaluated before the sending takes place. Therefore, the current implementation injects dmfun before each use of the send op in the source code. Dmfun would then have something like:
case Arg of
{a, B, {c, D}} -> do_something(Arg);
_ -> continue
end
where Arg can simply be passed to dmfun/1 because you have access to the forms generated from the source code.
So the problem is that any send operator will have dmfun/1 injected before it (and the send op's message passed as a parameter). But when sending messages like 50, {a, b}, [6, 4, 3] etc... these messages will certainly not match {a, B, {c, D}}, so injecting dmfun/1 at sends with these messages is a waste.
I want to be able to pick out plausible send operations like e.g. Pid ! {a, 5, SomeVar}, or Pid ! {a, X, SomeVar}. In both of these cases, it makes sense to inject dmfun/1 because if at runtime, SomeVar = {c, 50}, then the user supplied do_something/1 should be evaluated (but if SomeVar = 50, then it should not, because we're interested in {a, B, {c, D}} and 50 does not match {c, D}).
I wrote the following prematurely. It doesn't solve the problem I had. I ended up not including this feature. I left the explanation anyway, but if it were up to me, I'd delete this post entirely... I was still experimenting and I don't think what there is here will be of any use to anyone.
Before the explanation, let:
msg_format = the user supplied message format which will determine which messages being sent/received are interesting (e.g. {a, B, {c, D}}).
msg = the actual message being sent in the source code (e.g. Pid ! {a, X, Y}).
I gave the explanation below in a previous edit, but later found out that it wouldn't match some things it should. E.g. when msg_format = {a, B, {c, D}}, msg = {a, 5, SomeVar} wouldn't match when it should (by "match" I mean that dmfun/1 should be injected.
Let's call the "algorithm" outlined below Alg. The approach I took was to execute Alg(msg_format, msg) and Alg(msg, msg_format). The explanation below only goes through one of these. By repeating the same thing only getting a different matching function (matching_fun(msg_format) instead of matching_fun(msg)), and injecting dmfun/1 only if at least one of Alg(msg_format, msg) or Alg(msg, msg_format) returns true, then the result should be the injection of dmfun/1 where the desired message can actually be generated at runtime.
Take the message form you find in the [Forms] given to parse_transform/2 e.g. lets say you find: {op,24,'!',{var,24,'Pid'},{tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]}}
So you would take {tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]} which is the message being sent. (bind to Msg).
Do fill_vars(Msg) where:
-define(VARIABLE_FILLER, "_").
-spec fill_vars(erl_parse:abstract_form()) -> erl_parse:abstract_form().
%% #doc This function takes an abstract_form() and replaces all {var, LineNum, Variable} forms with
%% {string, LineNum, ?VARIABLE_FILLER}.
fill_vars(Form) ->
erl_syntax:revert(
erl_syntax_lib:map(
fun(DeltaTree) ->
case erl_syntax:type(DeltaTree) of
variable ->
erl_syntax:string(?VARIABLE_FILLER);
_ ->
DeltaTree
end
end,
Form)).
Do form_to_term/1 on 2's output, where:
form_to_term(Form) -> element(2, erl_eval:exprs([Form], [])).
Do term_to_str/1 on 3's output, where:
-define(inject_str(FormatStr, TermList), lists:flatten(io_lib:format(FormatStr, TermList))).
term_to_str(Term) -> ?inject_str("~p", [Term]).
Do gsub(v(4), "\"_\"", "_"), where v(4) is 4's output and gsub is: (taken from here)
gsub(Str,Old,New) -> RegExp = "\\Q"++Old++"\\E", re:replace(Str,RegExp,New,[global, multiline, {return, list}]).
Bind a variable (e.g. M) to matching_fun(v(5)), where:
matching_fun(StrPattern) ->
form_to_term(
str_to_form(
?inject_str(
"fun(MsgFormat) ->
case MsgFormat of
~s ->
true;
_ ->
false
end
end.", [StrPattern])
)
).
str_to_form(MsgFStr) ->
{_, Tokens, _} = erl_scan:string(end_with_period(MsgFStr)),
{_, Exprs} = erl_parse:parse_exprs(Tokens),
hd(Exprs).
end_with_period(String) ->
case lists:last(String) of
$. -> String;
_ -> String ++ "."
end.
Finally, take the user supplied message format (which is given as a string), e.g. MsgFormat = "{a, B, {c, D}}", and do: MsgFormatTerm = form_to_term(fill_vars(str_to_form(MsgFormat))). Then you can M(MsgFormatTerm).
e.g. with user supplied message format = {a, B, {c, D}}, and Pid ! {a, B, C} found in code:
2> weaver_ext:fill_vars({tuple,24,[{atom,24,a},{var,24,'B'},{var,24,'C'}]}).
{tuple,24,[{atom,24,a},{string,0,"_"},{string,0,"_"}]}
3> weaver_ext:form_to_term(v(2)).
{a,"_","_"}
4> weaver_ext:term_to_str(v(3)).
"{a,\"_\",\"_\"}"
5> weaver_ext:gsub(v(4), "\"_\"", "_").
"{a,_,_}"
6> M = weaver_ext:matching_fun(v(5)).
#Fun<erl_eval.6.13229925>
7> MsgFormatTerm = weaver_ext:form_to_term(weaver_ext:fill_vars(weaver_ext:str_to_form("{a, B, {c, D}}"))).
{a,"_",{c,"_"}}
8> M(MsgFormatTerm).
true
9> M({a, 10, 20}).
true
10> M({b, "_", 20}).
false
There is functionality for this in erl_types (HiPE).
I'm not sure you have the data in the right form for using this module though. I seem to remember that it takes Erlang terms as input. If you figure out the form issue you should be able to do most what you need with erl_types:t_from_term/1 and erl_types:t_is_subtype/2.
It was a long time ago that I last used these and I only ever did my testing runtime, as opposed to compile time. If you want to take a peek at usage pattern from my old code (not working any more) you can find it available at github.
I don't think this is possible at compile time in the general case. Consider:
send_msg(Pid, Msg) ->
Pid ! Msg.
Msg will look like a a var, which is a completely opaque type. You can't tell if it is a tuple or a list or an atom, since anyone could call this function with anything supplied for Msg.
This would be much easier to do at run time instead. Every time you use the ! operator, you'll need to call a wrapper function instead, which tries to match the message you are trying to send, and executes additional processing if the pattern is matched.
So I've been using Erlang for the last eight hours, and I've spent two of those banging my head against the keyboard trying to figure out the exception error my console keeps returning.
I'm writing a dice program to learn erlang. I want it to be able to call from the console through the erlang interpreter. The program accepts a number of dice, and is supposed to generate a list of values. Each value is supposed to be between one and six.
I won't bore you with the dozens of individual micro-changes I made to try and fix the problem (random engineering) but I'll post my code and the error.
The Source:
-module(dice2).
-export([d6/1]).
d6(1) ->
random:uniform(6);
d6(Numdice) ->
Result = [],
d6(Numdice, [Result]).
d6(0, [Finalresult]) ->
{ok, [Finalresult]};
d6(Numdice, [Result]) ->
d6(Numdice - 1, [random:uniform(6) | Result]).
When I run the program from my console like so...
dice2:d6(1).
...I get a random number between one and six like expected.
However when I run the same function with any number higher than one as an argument I get the following exception...
**exception error: no function clause matching dice2:d6(1, [4|3])
... I know I I don't have a function with matching arguments but I don't know how to write a function with variable arguments, and a variable number of arguments.
I tried modifying the function in question like so....
d6(Numdice, [Result]) ->
Newresult = [random:uniform(6) | Result],
d6(Numdice - 1, Newresult).
... but I got essentially the same error. Anyone know what is going on here?
This is basically a type error. When Result is a list, [Result] is a list with one element. E.g., if your function worked, it would always return a list with one element: Finalresult.
This is what happens (using ==> for "reduces to"):
d6(2) ==> %% Result == []
d6(2, [[]]) ==> %% Result == [], let's say random:uniform(6) gives us 3
d6(1, [3]) ==> %% Result == 3, let's say random:uniform(6) gives us 4
d6(0, [4|3]) ==> %% fails, since [Result] can only match one-element lists
Presumably, you don't want [[]] in the first call, and you don't want Result to be 3 in the third call. So this should fix it:
d6(Numdice) -> Result = [], d6(Numdice, Result). %% or just d6(Numdice, []).
d6(0, Finalresult) -> {ok, Finalresult};
d6(Numdice, Result) -> d6(Numdice - 1, [random:uniform(6) | Result]).
Lesson: if a language is dynamically typed, this doesn't mean you can avoid getting the types correct. On the contrary, it means that the compiler won't help you in doing this as much as it could.