Get or calculate Mnesia total size - erlang

I want to find the total size of a mnesia database. I have only one node.
Can I get the size of mnesia from some function or can I calculate it somehow?
I have looked in the documentation http://erlang.org/doc/man/mnesia.html, but I cannot find a fucntion to get such information for the whole database.
Do I need to calculate it per table using table_info/2? And if so how?
NOTE: I don't know how to do that with the current data points, the size is 2 (for testing I have only 2 entries) and memory is 348.

You need to iterate over all the tables with mnesia:system_info(tables) and read each table memory with mnesia:table_info(Table, memory) to obtain the number of words occupied by your table. To transform that value to bytes, you can first use erlang:system_info(wordsize) to get the word size in bytes for your machine architecture(on a 32-bit system a word is 4 bytes and 64 bits it's 8 bytes) and multiply it by your Mnesia table memory. A rough implementation:
%% Obtain the memory in of bytes of all the mnesia tables.
-spec get_mnesia_memory() -> MemInBytes :: number().
get_mnesia_memory() ->
WordSize = erlang:system_info(wordsize),
CollectMem = fun(Tbl, Acc) ->
Mem = mnesia:table_info(Tbl, memory) * WordSize,
Acc + Memory
end,
lists:foldl(CollectMem, 0, mnesia:system_info(tables)).

Related

Length and size of strings in Elixir / Erlang needs explanation

Can someone explain why s is a string with 4096 chars
iex(9)> s = String.duplicate("x", 4096)
... lots of "x"
iex(10)> String.length(s)
4096
but its memory size are a few 6 words?
iex(11)> :erts_debug.size(s)
6 # WHAT?!
And why s2 is a much shorter string than s
iex(13)> s2 = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
iex(14)> String.length(s)
50
but its size has more 3 words than s?
iex(15)> :erts_debug.size(s2)
9 # WHAT!?
And why does the size of these strings does not match their lengths?
Thanks
First clue why this is showing that values can be found in this question. Quoting size/1 docs:
%% size(Term)
%% Returns the size of Term in actual heap words. Shared subterms are
%% counted once. Example: If A = [a,b], B =[A,A] then size(B) returns 8,
%% while flat_size(B) returns 12.
Second clue can be found in Erlang documentation about bitstrings implementation.
So in the first case the string is too big to fit on heap alone, so it uses refc binaries which are stored on stack and on heap there is only pointer to given binary.
In second case string is shorter than 64 bytes and it uses heap binaries which is just array of bytes stored directly in on the heap, so that gives us 8 bytes per word (64-bit) * 9 = 72 and when we check documentation about exact memory overhead in VM we see that Erlang uses 3..6 words per binary + data, where data can be shared.

Apache Ignite use too much RAM

I've tried to use Ignite to store events, but face a problem of too much RAM usage during inserting new data
I'm runing ignite node with 1GB Heap and default configuration
curs.execute("""CREATE TABLE trololo (id LONG PRIMARY KEY, user_id LONG, event_type INT, timestamp TIMESTAMP) WITH "template=replicated" """);
n = 10000
for i in range(200):
values = []
for j in range(n):
id_ = i * n + j
event_type = random.randint(1, 5)
user_id = random.randint(1000, 5000)
timestamp = datetime.datetime.utcnow() - timedelta(hours=random.randint(1, 100))
values.append("({id}, {user_id}, {event_type}, '{timestamp}')".format(
id=id_, user_id=user_id, event_type=event_type, uid=uid, timestamp=timestamp.strftime('%Y-%m-%dT%H:%M:%S-00:00')
))
query = "INSERT INTO trololo (id, user_id, event_type, TIMESTAMP) VALUES %s;" % ",".join(values)
curs.execute(query)
But after loading about 10^6 events, I got 100% CPU usage because all heap are taken and GC trying to clean some space (unsuccessfully)
Then I stop for about 10 minutes and after that GC succesfully clean some space and I could continue loading new data
Then again heap fully loaded and all over again
It's really strange behaviour and I couldn't find a way how I could load 10^7 events without those problems
aproximately event should take:
8 + 8 + 4 + 10(timestamp size?) is about 30 bytes
30 bytes x3 (overhead) so it should be less than 100bytes per record
So 10^7 * 10^2 = 10^9 bytes = 1Gb
So it seems that 10^7 events should fit into 1Gb RAM, isn't it?
Actually, since version 2.0, Ignite stores all in offheap with default settings.
The main problem here is that you generate a very big query string with 10000 inserts, that should be parsed and, of course, will be stored in heap. After decreasing this size for each query, you will get better results here.
But also, as you can see in doc for capacity planning, Ignite adds around 200 bytes overhead for each entry. Additionally, add around 200-300MB per node for internal memory and reasonable amount of memory for JVM and GC to operate efficiently
If you really want to use only 1gb heap you can try to tune GC, but I would recommend increasing heap size.

How to distribute the records over mnesia fragments?

I have two erlang MNESIA nodes running in the cluster.
I have created table by the below properties.
mnesia:create_table(vmq_offline_store,[
{frag_properties,[
{node_pool,[node()|nodes()]},
{hash_module,verneDB_frag_hash},
{n_fragments,8},
{n_disc_only_copies,length([node()|nodes()])}]
},
{index,[]},{type, bag},
{attributes,record_info(fields,vmq_offline_store)}]).
I could see all the 8 fragments created on the two erlang nodes.
After this,I inserted 50000 records into the table using RPC call from external node.These 50000 records inserted into only vmq_offline_store. Not distributed over all the fragments.
vmq_offline_store: with 50000 records occupying 2096701142 bytes on disc
vmq_offline_store_frag2: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag3: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag4: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag5: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag6: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag7: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag8: with 0 records occupying 5464 bytes on disc
Could you please help me how to distribute the records over the fragments?
It's not enough to create the Mnesia table with fragmentation properties. Every table operation must explicitly specify the "access module" for fragmented tables, mnesia_frag. This is done by calling the function mnesia:activity/4, instead of calling mnesia:transaction/1 or using dirty operations.
For example, this code:
Fun = fun() -> ... end,
{atomic, Result} = mnesia:transaction(Fun),
becomes:
Fun = fun() -> ... end,
Result = mnesia:activity(transaction, Fun, [], mnesia_frag),
(Note that on errors mnesia:activity signals an error instead of returning {aborted, Reason}.)
For dirty operations, code like this:
mnesia:dirty_write(MyRecord)
becomes:
mnesia:activity(sync_dirty, mnesia, write, [MyRecord], mnesia_frag)
or alternatively:
mnesia:activity(sync_dirty, fun() -> mnesia:write(MyRecord) end, [],
mnesia_frag)
That is, never use the mnesia:dirty_* functions; use the "bare" ones within a dirty activity.

Direct Mapped Cache of Blocks Example

So i have this question in my homework assignment that i have struggling a bit with. I looked over my lecture content/notes and have been able to utilize those to answer the questions, however, i am not 100% sure that i did everything correctly. There are two parts (part C and D) in the question that i was not able to figure out even after consulting my notes and online sources. I am not looking for a solution for those two parts by any means, but it would be greatly appreciated if i could get, at least, a nudge in the right direction in how i can go about solving it.
I know this is a rather large question, however, i hope someone could possibly check my answers and tell me if all my work and methods of looking at this problem is correct. As always, thank you for any help :)
Alright, so now that we have the formalities out of the way,
--------------------------Here is the Question:--------------------------
Suppose a small direct-mapped cache of blocks with 32 blocks is constructed. Each cache block stores
eight 32-bit words. The main memory—which is byte addressable1—is 16,384 bytes in size. 32-bit words are stored
word aligned in memory, i.e., at an address that is divisible by 4.
(a) How many 32-bit words can the memory store (in decimal)?
(b) How many address bits would be required to address each byte of memory?
(c) What is the range of memory addresses, in hex? That is, what are the addresses of the first and last bytes of
memory? I'll give you a hint: memory addresses are numbered starting at 0.
(d) What would be the address of the last word in memory?
(e) Using the cache mapping scheme discussed in the Chapter 5 lecture notes, how many and which address bits
would be used to form the block offset?
(f) How many and which memory address bits would be used to form the cache index?
(g) How many and which address bits would be used to form the tag field for each cache block?
(h) To which cache block (in decimal) would memory address 0x2A5C map to?
(i) What would be the block offset (in decimal) for 0x2A5C?
(j) How many other main memory words would map to the same block as 0x2A5C?
(k) When the word at 0x2A5C is moved into a cache block, what are the memory addresses (in hex) of the other
words which will also be moved into this block? Express your answer as a range, e.g., [0x0000, 0x0200].
(l) The first word of a main memory block that is mapped to a cache block will always be at an address that is
divisible by __ (in decimal)?
(m) Including the V and tag bits of each cache block, what would be the total size of the cache (in bytes)
(n) what would be the size allocated for the data bits (in bytes)?
----------------------My answers and work-----------------------------------
a) memory = 16384 bytes. 16384 bytes into bits = 131072 bits. 131072/32 = 4096 32-bit words
b) 2^14 (main memory) * 2^2 (4 bits/word) = 2^16. take log(base2)(2^16) = 16 bits
c) couldnt figure this part out (would appreciate some input (NOT A SOLUTION) on how i can go about looking at this problem
d)could not figure this part out either :(
e)8 words in each cache line. 8 * 4(2^2 bits/word) = 32 bits in each cache line. log(base2)(2^5) = 5 bits used for block offset.
f) # of blocks = 2^5 = 32 blocks. log(base2)(2^5) = 5 bits for cache index
g) tag = 16 - 5 - 5 - 2(word alignment) = 4 bits
h) 0x2A5C
0010 10100 10111 00
tag index offset word aligned bits
maps to cache block index = 10100 = 0x14
i) maps to block offset = 10111 = 0x17
j) 4 tag bits, 5 block offset = 2^9 other main memory words
k) it is a permutation of the block offsets. so it maps the memory addresses with the same tag and cache index bits and block offsets of 0x00 0x01 0x02 0x04 0x08 0x10 0x11 0x12 0x14 0x18 0x1C 0x1E 0x1F
l)divisible by 4
m) 2(V+tag+data) = 2(1+4+2^3*2^5) = 522 bits = 65.25 bytes
n)data bits = 2^5 blocks * 2^3 words per block = 256 bits = 32 bytes
Part C:
If a memory has M bytes, and the memory is byte addressable, the the memory addresses range from 0 to M - 1.
For your question, this means that memory addresses range from 0 to 16383, or in hex 0x0 to 0x3FFF.
Part D:
Words are 4 bytes long. So given your answer to C, the last word is at:
(0x3FFFF - 3) -> 0x3FFC.
You can see that this is correct because the lowest 2 bits of the address are 0, which must be true of any 4 byte aligned address.

retrieval of data from ETS table

I know that lookup time is constant for ETS tables. But I also heard that the table is kept outside of the process and when retrieving data, it needs to be moved to the process heap. So, this is expensive. But then, how to explain this:
18> {Time, [[{ok, Binary}]]} = timer:tc(ets, match, [utilo, {a, '$1'}]).
{0,
[[{ok,<<255,216,255,225,63,254,69,120,105,102,0,0,73,
73,42,0,8,0,0,0,10,0,14,...>>}]]}
19> size(Binary).
1759017
1.7 MB binary takes 0 time to be retrieved from the table!?
EDIT: After I saw Odobenus Rosmarus's answer, I decided to convert the binary to list. Here is the result:
1> {ok, B} = file:read_file("IMG_2171.JPG").
{ok,<<255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,
0,8,0,0,0,10,0,14,1,2,0,32,...>>}
2> size(B).
1986392
3> L = binary_to_list(B).
[255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,0,
0,10,0,14,1,2,0,32,0,0|...]
4> length(L).
1986392
5> ets:insert(utilo, {a, L}).
true
6> timer:tc(ets, match, [utilo, {a, '$1'}]).
{106000,
[[[255,216,255,225,63,254,69,120,105,102,0,0,73,73,42,0,8,0,
0,0,10,0,14,1,2|...]]]}
Now it takes 106000 microseconds to retrieve 1986392 long list from the table which is pretty fast, isn't it? Lists are 2 words per element. Thus the data is 4x1.7MB.
EDIT 2: I started a thread on erlang-question (http://groups.google.com/group/erlang-programming/browse_thread/thread/5581a8b5b27d4fe1) and it turns out that 0.1 second is pretty much the time it takes to do memcpy() (move the data to the process's heap). On the other hand Odobenus Rosmarus's answer explains why retrieving binary takes 0 time.
binaries itself (that longer than 64 bits) are stored in the special heap, outside of process heap.
So, retrieval of binary from the ets table moves to process heap just 'Procbin' part of binary. (roughly it's pointer to start of binary in the binaries memory and size).

Resources