Size in MB of mnesia table - erlang

How do you read the :mnesia.info?
For example I only have one table, some_table, and :mnesia.info returns me this.
---> Processes holding locks <---
---> Processes waiting for locks <---
---> Participant transactions <---
---> Coordinator transactions <---
---> Uncertain transactions <---
---> Active tables <---
some_table: with 16020 records occupying 433455 words of mem
schema : with 2 records occupying 536 words of mem
===> System info in version "4.15.5", debug level = none <===
opt_disc. Directory "/home/ubuntu/project/Mnesia.nonode#nohost" is NOT used.
use fallback at restart = false
running db nodes = [nonode#nohost]
stopped db nodes = []
master node tables = []
remote = []
ram_copies = ['some_table',schema]
disc_copies = []
disc_only_copies = []
[{nonode#nohost,ram_copies}] = [schema,'some_table']
488017 transactions committed, 0 aborted, 0 restarted, 0 logged to disc
0 held locks, 0 in queue; 0 local transactions, 0 remote
0 transactions waits for other nodes: []
Also calling:
:mnesia.table_info("some_table", :size)
It returns me 16020 which I think is the number of keys, but how can I get the memory usage?

First, you need mnesia:table_info(Table, memory) to obtain the number of words occupied by your table, in your example you are getting the number of items in the table, not the memory. To transform that value to MB, you can first use erlang:system_info(wordsize) to get the word size in bytes for your machine architecture(on a 32 bit system a word is 4 bytes and 64 bits it's 8 bytes), multiply it by your Mnesia table memory to obtain the size in bytes and finally transform the value to MegaBytes like:
MnesiaMemoryMB = (mnesia:table_info("some_table", memory) * erlang:system_info(wordsize)) / (1024*1024).

You can use erlang:system_info(wordsize) to get the word size in bytes, on a 32 bit system a word is 32 bits or 4 bytes, on 64 bit it's 8 bytes. So your table is using 433455 x wordsize.

Related

Get or calculate Mnesia total size

I want to find the total size of a mnesia database. I have only one node.
Can I get the size of mnesia from some function or can I calculate it somehow?
I have looked in the documentation http://erlang.org/doc/man/mnesia.html, but I cannot find a fucntion to get such information for the whole database.
Do I need to calculate it per table using table_info/2? And if so how?
NOTE: I don't know how to do that with the current data points, the size is 2 (for testing I have only 2 entries) and memory is 348.
You need to iterate over all the tables with mnesia:system_info(tables) and read each table memory with mnesia:table_info(Table, memory) to obtain the number of words occupied by your table. To transform that value to bytes, you can first use erlang:system_info(wordsize) to get the word size in bytes for your machine architecture(on a 32-bit system a word is 4 bytes and 64 bits it's 8 bytes) and multiply it by your Mnesia table memory. A rough implementation:
%% Obtain the memory in of bytes of all the mnesia tables.
-spec get_mnesia_memory() -> MemInBytes :: number().
get_mnesia_memory() ->
WordSize = erlang:system_info(wordsize),
CollectMem = fun(Tbl, Acc) ->
Mem = mnesia:table_info(Tbl, memory) * WordSize,
Acc + Memory
end,
lists:foldl(CollectMem, 0, mnesia:system_info(tables)).

Apache Ignite use too much RAM

I've tried to use Ignite to store events, but face a problem of too much RAM usage during inserting new data
I'm runing ignite node with 1GB Heap and default configuration
curs.execute("""CREATE TABLE trololo (id LONG PRIMARY KEY, user_id LONG, event_type INT, timestamp TIMESTAMP) WITH "template=replicated" """);
n = 10000
for i in range(200):
values = []
for j in range(n):
id_ = i * n + j
event_type = random.randint(1, 5)
user_id = random.randint(1000, 5000)
timestamp = datetime.datetime.utcnow() - timedelta(hours=random.randint(1, 100))
values.append("({id}, {user_id}, {event_type}, '{timestamp}')".format(
id=id_, user_id=user_id, event_type=event_type, uid=uid, timestamp=timestamp.strftime('%Y-%m-%dT%H:%M:%S-00:00')
))
query = "INSERT INTO trololo (id, user_id, event_type, TIMESTAMP) VALUES %s;" % ",".join(values)
curs.execute(query)
But after loading about 10^6 events, I got 100% CPU usage because all heap are taken and GC trying to clean some space (unsuccessfully)
Then I stop for about 10 minutes and after that GC succesfully clean some space and I could continue loading new data
Then again heap fully loaded and all over again
It's really strange behaviour and I couldn't find a way how I could load 10^7 events without those problems
aproximately event should take:
8 + 8 + 4 + 10(timestamp size?) is about 30 bytes
30 bytes x3 (overhead) so it should be less than 100bytes per record
So 10^7 * 10^2 = 10^9 bytes = 1Gb
So it seems that 10^7 events should fit into 1Gb RAM, isn't it?
Actually, since version 2.0, Ignite stores all in offheap with default settings.
The main problem here is that you generate a very big query string with 10000 inserts, that should be parsed and, of course, will be stored in heap. After decreasing this size for each query, you will get better results here.
But also, as you can see in doc for capacity planning, Ignite adds around 200 bytes overhead for each entry. Additionally, add around 200-300MB per node for internal memory and reasonable amount of memory for JVM and GC to operate efficiently
If you really want to use only 1gb heap you can try to tune GC, but I would recommend increasing heap size.

Mnesia: always suffix fragmented table fragments?

When I create a fragmented table in Mnesia, all of the table fragments will have the suffix _fragN except for the first fragment. This is error-prone, since any code that accesses the table without specifying the correct access module will appear to work, since it reads from and writes to the first fragment, but it will not mix with code using the correct access module, since they will be looking for elements in different places.
Is there a way to tell Mnesia to use a fragment suffix for all table fragments? That would avoid that problem, by making incorrect accesses fail noisily.
For example, if I create a table with four fragments:
1> mnesia:start().
ok
2> mnesia:create_table(foo, [{frag_properties, [{node_pool, [node()]}, {n_fragments, 4}]}]).
{atomic,ok}
then mnesia:info/0 will list the fragments as foo, foo_frag2, foo_frag3 and foo_frag4:
3> mnesia:info().
---> Processes holding locks <---
---> Processes waiting for locks <---
---> Participant transactions <---
---> Coordinator transactions <---
---> Uncertain transactions <---
---> Active tables <---
foo : with 0 records occupying 304 words of mem
foo_frag2 : with 0 records occupying 304 words of mem
foo_frag3 : with 0 records occupying 304 words of mem
foo_frag4 : with 0 records occupying 304 words of mem
schema : with 5 records occupying 950 words of mem
===> System info in version "4.14", debug level = none <===
opt_disc. Directory "/Users/legoscia/Mnesia.nonode#nohost" is NOT used.
use fallback at restart = false
running db nodes = [nonode#nohost]
stopped db nodes = []
master node tables = []
remote = []
ram_copies = [foo,foo_frag2,foo_frag3,foo_frag4,schema]
disc_copies = []
disc_only_copies = []
[{nonode#nohost,ram_copies}] = [schema,foo_frag4,foo_frag3,foo_frag2,foo]
3 transactions committed, 0 aborted, 0 restarted, 0 logged to disc
0 held locks, 0 in queue; 0 local transactions, 0 remote
0 transactions waits for other nodes: []
I'd want foo to be foo_frag1 instead. Is that possible?

How to distribute the records over mnesia fragments?

I have two erlang MNESIA nodes running in the cluster.
I have created table by the below properties.
mnesia:create_table(vmq_offline_store,[
{frag_properties,[
{node_pool,[node()|nodes()]},
{hash_module,verneDB_frag_hash},
{n_fragments,8},
{n_disc_only_copies,length([node()|nodes()])}]
},
{index,[]},{type, bag},
{attributes,record_info(fields,vmq_offline_store)}]).
I could see all the 8 fragments created on the two erlang nodes.
After this,I inserted 50000 records into the table using RPC call from external node.These 50000 records inserted into only vmq_offline_store. Not distributed over all the fragments.
vmq_offline_store: with 50000 records occupying 2096701142 bytes on disc
vmq_offline_store_frag2: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag3: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag4: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag5: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag6: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag7: with 0 records occupying 5464 bytes on disc
vmq_offline_store_frag8: with 0 records occupying 5464 bytes on disc
Could you please help me how to distribute the records over the fragments?
It's not enough to create the Mnesia table with fragmentation properties. Every table operation must explicitly specify the "access module" for fragmented tables, mnesia_frag. This is done by calling the function mnesia:activity/4, instead of calling mnesia:transaction/1 or using dirty operations.
For example, this code:
Fun = fun() -> ... end,
{atomic, Result} = mnesia:transaction(Fun),
becomes:
Fun = fun() -> ... end,
Result = mnesia:activity(transaction, Fun, [], mnesia_frag),
(Note that on errors mnesia:activity signals an error instead of returning {aborted, Reason}.)
For dirty operations, code like this:
mnesia:dirty_write(MyRecord)
becomes:
mnesia:activity(sync_dirty, mnesia, write, [MyRecord], mnesia_frag)
or alternatively:
mnesia:activity(sync_dirty, fun() -> mnesia:write(MyRecord) end, [],
mnesia_frag)
That is, never use the mnesia:dirty_* functions; use the "bare" ones within a dirty activity.

What is the relation between address lines and memory?

These are my assignments:
Write a program to find the number of address lines in an n Kbytes of memory. Assume that n is always to the power of 2.
Sample input: 2
Sample output: 11
I don't need specific coding help, but I don't know the relation between address lines and memory.
To express in very easy terms, without any bus-multiplexing, the number of bits required to address a memory is the number of lines (address or data) required to access that memory.
Quoting from the Wikipedia article,
a system with a 32-bit address bus can address 232 (4,294,967,296) memory locations.
for a simple example, consider this, you have 3 address lines (A, B, C), so the values which can be formed using 3 bits are
A B C
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Total 8 values. So using ABC, you can access any of those eight values, i.e., you can reach any of those memory addresses.
So, TL;DR, the simple relationship is, with n number of lines, we can represent 2n number of addresses.
An address line usually refers to a physical connection between a CPU/chipset and memory. They specify which address to access in the memory. So the task is to find out how many bits are required to pass the input number as an address.
In your example, the input is 2 kilobytes = 2048 = 2^11, hence the answer 11. If your input is 64 kilobytes, the answer is 16 (65536 = 2^16).

Resources