I know that lookup time is constant for ETS tables. But I also heard that the table is kept outside of the process and when retrieving data, it needs to be moved to the process heap. So, this is expensive. But then, how to explain this:
18> {Time, [[{ok, Binary}]]} = timer:tc(ets, match, [utilo, {a, '$1'}]).
{0,
[[{ok,<<255,216,255,225,63,254,69,120,105,102,0,0,73,
73,42,0,8,0,0,0,10,0,14,...>>}]]}
19> size(Binary).
1759017
1.7 MB binary takes 0 time to be retrieved from the table!?
EDIT: After I saw Odobenus Rosmarus's answer, I decided to convert the binary to list. Here is the result:
1> {ok, B} = file:read_file("IMG_2171.JPG").
{ok,<<255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,
0,8,0,0,0,10,0,14,1,2,0,32,...>>}
2> size(B).
1986392
3> L = binary_to_list(B).
[255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,0,
0,10,0,14,1,2,0,32,0,0|...]
4> length(L).
1986392
5> ets:insert(utilo, {a, L}).
true
6> timer:tc(ets, match, [utilo, {a, '$1'}]).
{106000,
[[[255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,
0,0,10,0,14,1,2|...]]]}
Now it takes 106000 microseconds to retrieve 1986392 long list from the table which is pretty fast, isn't it? Lists are 2 words per element. Thus the data is 4x1.7MB.
EDIT 2: I started a thread on erlang-question (http://groups.google.com/group/erlang-programming/browse_thread/thread/5581a8b5b27d4fe1) and it turns out that 0.1 second is pretty much the time it takes to do memcpy() (move the data to the process's heap). On the other hand Odobenus Rosmarus's answer explains why retrieving binary takes 0 time.
binaries itself (that longer than 64 bits) are stored in the special heap, outside of process heap.
So, retrieval of binary from the ets table moves to process heap just 'Procbin' part of binary. (roughly it's pointer to start of binary in the binaries memory and size).
Related
I want to find the total size of a mnesia database. I have only one node.
Can I get the size of mnesia from some function or can I calculate it somehow?
I have looked in the documentation http://erlang.org/doc/man/mnesia.html, but I cannot find a fucntion to get such information for the whole database.
Do I need to calculate it per table using table_info/2? And if so how?
NOTE: I don't know how to do that with the current data points, the size is 2 (for testing I have only 2 entries) and memory is 348.
You need to iterate over all the tables with mnesia:system_info(tables) and read each table memory with mnesia:table_info(Table, memory) to obtain the number of words occupied by your table. To transform that value to bytes, you can first use erlang:system_info(wordsize) to get the word size in bytes for your machine architecture(on a 32-bit system a word is 4 bytes and 64 bits it's 8 bytes) and multiply it by your Mnesia table memory. A rough implementation:
%% Obtain the memory in of bytes of all the mnesia tables.
-spec get_mnesia_memory() -> MemInBytes :: number().
get_mnesia_memory() ->
WordSize = erlang:system_info(wordsize),
CollectMem = fun(Tbl, Acc) ->
Mem = mnesia:table_info(Tbl, memory) * WordSize,
Acc + Memory
end,
lists:foldl(CollectMem, 0, mnesia:system_info(tables)).
Why is the following saying variable unbound?
9> {<<A:Length/binary, Rest/binary>>, Length} = {<<1,2,3,4,5>>, 3}.
* 1: variable 'Length' is unbound
It's pretty clear that Length should be 3.
I am trying to have a function with similar pattern matching, ie.:
parse(<<Body:Length/binary, Rest/binary>>, Length) ->
But if fails with the same reason. How can I achieve the pattern matching I want?
What I am really trying to achieve is parse in incoming tcp stream packets as LTV(Length, Type, Value).
At some point after I parse the the Length and the Type, I want to ready only up to Length number of bytes as the value, as the rest will probably be for the next LTV.
So my parse_value function is like this:
parse_value(Value0, Left, Callback = {Module, Function},
{length, Length, type, Type, value, Value1}) when byte_size(Value0) >= Left ->
<<Value2:Left/binary, Rest/binary>> = Value0,
Module:Function({length, Length, type, Type, value, lists:reverse([Value2 | Value1])}),
if
Rest =:= <<>> ->
{?MODULE, parse, {}};
true ->
parse(Rest, Callback, {})
end;
parse_value(Value0, Left, _, {length, Length, type, Type, value, Value1}) ->
{?MODULE, parse_value, Left - byte_size(Value0), {length, Length, type, Type, value, [Value0 | Value1]}}.
If I could do the pattern matching, I could break it up to something more pleasant to the eye.
The rules for pattern matching are that if a variable X occurs in two subpatterns, as in {X, X}, or {X, [X]}, or similar, then they have to have the same value in both positions, but the matching of each subpattern is still done in the same input environment - bindings from one side do not carry over to the other. The equality check is conceptually done afterwards, as if you had matched on {X, X2} and added a guard X =:= X2. This means that your Length field in the tuple cannot be used as input to the binary pattern, not even if you make it the leftmost element.
However, within a binary pattern, variables bound in a field can be used in other fields following it, left-to-right. Therefore, the following works (using a leading 32-bit size field in the binary):
1> <<Length:32, A:Length/binary, Rest/binary>> = <<0,0,0,3,1,2,3,4,5>>.
<<0,0,0,3,1,2,3,4,5>>
2> A.
<<1,2,3>>
3> Rest.
<<4,5>>
I've run into this before. There is some weirdness between what is happening inside binary syntax and what happens during unification (matching). I suspect that it is just that binary syntax and matching occur at different times in the VM somewhere (we don't know which Length is failing to get assigned -- maybe binary matching is always first in evaluation, so Length is still meaningless). I was once going to dig in and find out, but then I realized that I never really needed to solve this problem -- which might be why it was never "solved".
Fortunately, this won't stop you with whatever you are doing.
Unfortunately, we can't really help further unless you explain the context in which you think this kind of a match is a good idea (you are having an X-Y problem).
In binary parsing you can always force the situation to be one of the following:
Have a fixed-sized header at the beginning of the binary message that tells you the next size element you need (and from there that can continue as a chain of associations endlessly)
Inspect the binary once on entry to determine the size you are looking for, pull that one value, and then begin the real parsing task
Have a set of fields, all of predetermined sizes that conform to some a binary schema standard
Convert the binary to a list and iterate through it with any arbitrary amount of look-ahead and backtracking you might need
Quick Solution
Without knowing anything else about your general problem, a typical solution would look like:
parse(Length, Bin) ->
<<Body:Length/binary, Rest/binary>> = Bin,
ok = do_something(Body),
do_other_stuff(Rest).
But I smell something funky here.
Having things like this in your code is almost always a sign that a more fundamental aspect of the code structure is not in agreement with the data that you are handling.
But deadlines.
Erlang is all about practical code that satisfies your goals in the real world. With that in mind, I suggest that you do something like the above for now, and then return to this problem domain and rethink it. Then refactor it. This will gain you three benefits:
Something will work right away.
You will later learn something fundamental about parsing in general.
Your code will almost certainly run faster if it fits your data better.
Example
Here is an example in the shell:
1> Parse =
1> fun
1> (Length, Bin) when Length =< byte_size(Bin) ->
1> <<Body:Length/binary, Rest/binary>> = Bin,
1> ok = io:format("Chopped off ~p bytes: ~p~n", [Length, Body]),
1> Rest;
1> (Length, Bin) ->
1> ok = io:format("Binary shorter than ~p~n", [Length]),
1> Bin
1> end.
#Fun<erl_eval.12.87737649>
2> Parse(3, <<1,2,3,4,5>>).
Chopped off 3 bytes: <<1,2,3>>
<<4,5>>
3> Parse(8, <<1,2,3,4,5>>).
Binary shorter than 8
<<1,2,3,4,5>>
Note that this version is a little safer, in that we avoid a crash in the case that Length is longer than the binary. This is yet another good reason why maybe we can't do that match in the function head.
Try with below code:
{<<A:Length/binary, Rest/binary>>, _} = {_, Length} = {<<1,2,3,4,5>>, 3}.
This question is mentioned a bit in EEP-52:
Any variables used in the expression must have been previously bound, or become bound in the same binary pattern as the expression. That is, the following example is illegal:
illegal_example2(N, <<X:N,T/binary>>) ->
{X,T}.
And explained a bit more in the following e-mail: http://erlang.org/pipermail/eeps/2020-January/000636.html
Illegal. With one exception, matching is not done in a left-to-right
order, but all variables in the pattern will be bound at the same
time. That means that the variables must be bound before the match
starts. For maps, that means that the variables referenced in key
expressions must be bound before the case (or receive) that matches
the map. In a function head, all map keys must be literals.
The exception to this general rule is that within a binary pattern,
the segments are matched from left to right, and a variable bound in a
previous segment can be used in the size expression for a segment
later in the binary pattern.
Also one of the members of OTP team mentioned that they made a prototype that can do that, but it was never finished http://erlang.org/pipermail/erlang-questions/2020-May/099538.html
We actually tried to make your example legal. The transformation of
the code that we did was not to rewrite to guards, but to match
arguments or parts of argument in the right order so that variables
that input variables would be bound before being used. (We would do a
topological sort to find the correct order.) For your example, the
transformation would look similar to this:
legal_example(Key, Map) ->
case Map of
#{Key := Value} -> Value;
_ -> error(function_clause, [Key, Map])
end.
In the prototype implementation, the compiler could compile the
following example:
convoluted(Ref,
#{ node(Ref) := NodeId, Loop := universal_answer},
[{NodeId, Size} | T],
<<Int:(Size*8+length(T)),Loop>>) when is_reference(Ref) ->
Int.
Things started to fall apart when variables are repeated. Repeated
variables in patterns already have a meaning in Erlang (they should be
the same), so it become tricky to understand to distinguish between
variables being bound or variables being used a binary size or map
key. Here is an example that the prototype couldn't handle:
foo(#{K := K}, K) -> ok.
A human can see that it should be transformed similar to this:
foo(Map, K) -> case Map of
{K := V} when K =:= V -> ok end.
Here are few other examples that should work but the prototype would
refuse to compile (often emitting an incomprehensible error message):
bin2(<<Sz:8,X:Sz>>, <<Y:Sz>>) -> {X,Y}.
repeated_vars(#{K := #{K := K}}, K) -> K.
match_map_bs(#{K1 := {bin,<<Int:Sz>>}, K2 := <<Sz:8>>}, {K1,K2}) ->
Int.
Another problem was when example was correctly rejected, the error
message would be confusing.
Because much more work would clearly be needed, we have shelved the
idea for now. Personally, I am not sure that the idea is sound in the
first place. But I am sure of one thing: the implementation would be
very complicated.
UPD: latest news from 2020-05-14
My project has blown through the max 1M atoms, we've cranked up the limit, but I need to apply some sanity to the code that people are submitting with regard to list_to_atom and its friends. I'd like to start by getting a list of all the registered atoms so I can see where the largest offenders are. Is there any way to do this. I'll have to be creative about how I do it so I don't end up trying to dump 1-2M lines in a live console.
You can get hold of all atoms by using an undocumented feature of the external term format.
TL;DR: Paste the following line into the Erlang shell of your running node. Read on for explanation and a non-terse version of the code.
(fun F(N)->try binary_to_term(<<131,75,N:24>>) of A->[A]++F(N+1) catch error:badarg->[]end end)(0).
Elixir version by Ivar Vong:
for i <- 0..:erlang.system_info(:atom_count)-1, do: :erlang.binary_to_term(<<131,75,i::24>>)
An Erlang term encoded in the external term format starts with the byte 131, then a byte identifying the type, and then the actual data. I found that EEP-43 mentions all the possible types, including ATOM_INTERNAL_REF3 with type byte 75, which isn't mentioned in the official documentation of the external term format.
For ATOM_INTERNAL_REF3, the data is an index into the atom table, encoded as a 24-bit integer. We can easily create such a binary: <<131,75,N:24>>
For example, in my Erlang VM, false seems to be the zeroth atom in the atom table:
> binary_to_term(<<131,75,0:24>>).
false
There's no simple way to find the number of atoms currently in the atom table*, but we can keep increasing the number until we get a badarg error.
So this little module gives you a list of all atoms:
-module(all_atoms).
-export([all_atoms/0]).
atom_by_number(N) ->
binary_to_term(<<131,75,N:24>>).
all_atoms() ->
atoms_starting_at(0).
atoms_starting_at(N) ->
try atom_by_number(N) of
Atom ->
[Atom] ++ atoms_starting_at(N + 1)
catch
error:badarg ->
[]
end.
The output looks like:
> all_atoms:all_atoms().
[false,true,'_',nonode#nohost,'$end_of_table','','fun',
infinity,timeout,normal,call,return,throw,error,exit,
undefined,nocatch,undefined_function,undefined_lambda,
'DOWN','UP','EXIT',aborted,abs_path,absoluteURI,ac,accessor,
active,all|...]
> length(v(-1)).
9821
* In Erlang/OTP 20.0, you can call erlang:system_info(atom_count):
> length(all_atoms:all_atoms()) == erlang:system_info(atom_count).
true
I'm not sure if there's a way to do it on a live system, but if you can run it in a test environment you should be able to get a list via crash dump. The atom table is near the end of the crash dump format. You can create a crash dump via erlang:halt/1, but that will bring down the whole runtime system.
I dare say that if you use more than 1M atoms, then you are doing something wrong. Atoms are intended to be static as soon as the application runs or at least upper bounded by some small number, 3000 or so for a medium sized application.
Be very careful when an enemy can generate atoms in your vm. especially calls like list_to_atom/1 is somewhat dangerous.
EDITED (wrong answer..)
You can adjust number of atoms with +t
http://www.erlang.org/doc/efficiency_guide/advanced.html
..but I know very few use cases when it is necessary.
You can track atom stats with erlang:memory()
I wanna measure the performance to my database by measuring the time taken to do something as the number of processes increase. The intention is to plot a graph of performance vs number of processes after, anyone has an idea how? i am a beginner in elrlang please helo
Assuming your database is mnesia, this should not be hard. one way would be to have a write function and a read function. However, note that there are several Activity access contexts with mnesia. To test write times, you should NOT use the context of transaction because it returns immediately to the calling process, even before a disc write has occured. However, for disc writes, its important that you look at the context called: sync_transaction. Here is an example:
write(Record)->
Fun = fun(R)-> mnesia:write(R) end,
mnesia:activity(sync_transaction,Fun,[Record],mnesia_frag).
The function above will return only when all active replicas of the mnesia table have committed the record onto the data disc file. Hence to test the speed as processes increase, you need to have a record generator,a a process spawner , the write function and finally a timing mechanism. For timing, we have a built in function called: timer:tc/1, timer:tc/2 and timer:tc/3 which returns the exact time it took to execute (completely) a given function. To cut the story short, this is how i would do this:
-module(stress_test).
-compile(export_all).
-define(LIMIT,10000).
-record(book,{
isbn,
title,
price,
version}).
%% ensure this table is {type,bag}
-record(write_time,{
isbn,
num_of_processes,
write_time
}).
%% Assuming table (book) already exists
%% Assuming mnesia running already
start()->
ensure_gproc(),
tv:start(),
spawn_many(?LIMIT).
spawn_many(0)-> ok;
spawn_many(N)->
spawn(?MODULE,process,[]),
spawn_many(N - 1).
process()->
gproc:reg({n, l,guid()},ignored),
timer:apply_interval(timer:seconds(2),?MODULE,write,[]),
receive
<<"stop">> -> exit(normal)
end.
total_processes()->
proplists:get_value(size,ets:info(gproc)) div 3.
ensure_gproc()->
case lists:keymember(gproc,1,application:which_applications()) of
true -> ok;
false -> application:start(gproc)
end.
guid()->
random:seed(now()),
MD5 = erlang:md5(term_to_binary([random:uniform(152629977),{node(), now(), make_ref()}])),
MD5List = lists:nthtail(3, binary_to_list(MD5)),
F = fun(N) -> f("~2.16.0B", [N]) end,
L = [F(N) || N <- MD5List],
lists:flatten(L).
generate_record()->
#book{isbn = guid(),title = guid(),price = guid()}.
write()->
Record = generate_record(),
Fun = fun(R)-> ok = mnesia:write(R),ok end,
%% Here is now the actual write we measure
{Time,ok} = timer:tc(mnesia,activity,[sync_transaction,Fun,[Record],mnesia_frag]),
%% The we save that time, the number of processes
%% at that instant
NoteTime = #write_time{
isbn = Record#book.isbn,
num_of_processes = total_processes(),
write_time = Time
},
mnesia:activity(transaction,Fun,[NoteTime],mnesia_frag).
Now there are dependencies here, especially: gproc download and build it into your erlang lib path from here Download Gproc.To run this, just call: stress_test:start(). The table write_time will help you draw a graph of number of processes against time taken to write. As the number of processes increase from 0 to the upper limit (?LIMIT), we note the time taken to write a given record at the given instant and we also note the number of processes at that time.UPDATE
f(S)-> f(S,[]).
f(S,Args) -> lists:flatten(io_lib:format(S, Args)).
That is the missing function. Apologies.... Remember to study the table write_time, using the application tv, a window is opened in which you can examine the mnesia tables. Use this table to see increasing write times/ or decreasing performance as number of processes increase from time to time. An element i have left out is to note the actual time of the write action using time() which may be important parameter. You may add it in the table definition of the write_time table.
Also look at http://wiki.basho.com/Benchmarking.html
you might look at tsung http://tsung.erlang-projects.org/
As suggested in answers to a previous question, I tried using Erlang proplists to implement a prefix trie.
The code seems to work decently well... But, for some reason, it doesn't play well with the interactive shell. When I try to run it, the shell hangs:
> Trie = trie:from_dict(). % Creates a trie from a dictionary
% ... the trie is printed ...
% Then nothing happens
I see the new trie printed to the screen (ie, the call to trie:from_dict() has returned), then the shell just hangs. No new > prompt comes up and ^g doesn't do anything (but ^c will eventually kill it off).
With a very small dictionary (the first 50 lines of /usr/share/dict/words), the hang only lasts a second or two (and the trie is built almost instantly)... But it seems to grow exponentially with the size of the dictionary (100 words takes 5 or 10 seconds, I haven't had the patience to try larger wordlists). Also, as the shell is hanging, I notice that the beam.smp process starts eating up a lot of memory (somewhere between 1 and 2 gigs).
So, is there anything obvious that could be causing this shell hang and incredible memory usage?
Some various comments:
I have a hunch that the garbage collector is at fault, but I don't know how to profile or create an experiment to test that.
I've tried profiling with eprof and nothing obvious showed up.
Here is my "add string to trie" function:
add([], Trie) ->
[ stop | Trie ];
add([Ch|Rest], Trie) ->
SubTrie = proplists:get_value(Ch, Trie, []),
NewSubTrie = add(Rest, SubTrie),
NewTrie = [ { Ch, NewSubTrie } | Trie ],
% Arbitrarily decide to compress key/value list once it gets
% more than 60 pairs.
if length(NewTrie) > 60 ->
proplists:compact(NewTrie);
true ->
NewTrie
end.
The problem is (amongst others ? -- see my comment) that you are always adding a new {Ch, NewSubTrie} tuple to your proplist Tries, no matter if Ch already existed, or not.
Instead of
NewTrie = [ { Ch, NewSubTrie } | Trie ]
you need something like:
NewTrie = lists:keystore(Ch, 1, Trie, {Ch, NewSubTrie})
You're not really building a trie here. Your end result is effectively a randomly ordered proplist of proplists that requires full scans at each level when walking the list. Tries are typically implied ordering based on position in the array (or list).
Here's an implementation that uses tuples as the storage mechanism. Calling set only rebuilds the root and direct path tuples.
(note: would probably have to make the pair a triple (adding size) make delete work with any efficiency)
I believe erlang tuples are really just arrays (thought I read that somewhere), so lookup should be super fast, and modify is probably straight forward. Maybe this is faster with the array module, but I haven't really played with it much to know.
this version also stores an arbitrary value, so you can do things like:
1> c(trie).
{ok,trie}
2> trie:get("ab",trie:set("aa",bar,trie:new("ab",foo))).
foo
3> trie:get("abc",trie:set("aa",bar,trie:new("ab",foo))).
undefined
4>
code (entire module): note2: assumes lower case non empty string keys
-module(trie).
-compile(export_all).
-define(NEW,{ %% 26 pairs, to avoid cost of calculating a new level at runtime
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth}
}
).
-define(POS(Ch), Ch - $a + 1).
new(Key,V) -> set(Key,V,?NEW).
set([H],V,Trie) ->
Pos = ?POS(H),
{_,SubTrie} = element(Pos,Trie),
setelement(Pos,Trie,{V,SubTrie});
set([H|T],V,Trie) ->
Pos = ?POS(H),
{SubKey,SubTrie} = element(Pos,Trie),
case SubTrie of
nodepth -> setelement(Pos,Trie,{SubKey,set(T,V,?NEW)});
SubTrie -> setelement(Pos,Trie,{SubKey,set(T,V,SubTrie)})
end.
get([H],Trie) ->
{Val,_} = element(?POS(H),Trie),
Val;
get([H|T],Trie) ->
case element(?POS(H),Trie) of
{_,nodepth} -> undefined;
{_,SubTrie} -> get(T,SubTrie)
end.