Can the Erlang ETS tables be shared among different processes? Thus, if I have two processes running on different Erlang running systems, can I somehow link them so that all the changes I do in one ETS table will be reflected in the other?
Within a single Erlang node, ETS tables can be fully shared by passing the public option to ets:new. (Beware that the table will be destroyed if its owner dies, though, unless you have set up an heir.)
If you need to share tables across several Erlang nodes, you need to use Mnesia.
You cannot "share" an ETS table between processes on different nodes, an ETS table is only accessible by processes on the node on which it was created. If you want to share ETS tables then you will need create a process on one node, the node with the table, and access the table from the other node through this process. It is not really that difficult.
Related
working on an erlang project using mnesia (some tables ram copies, some tables disk copies, some tables both). in an attempt to optimize a certain read (ram table), i used the ets lookup rather than the mnesia dirty_read i had been using, and timed both versions of the routine. the ets lookup was significantly faster than the mnesia dirty_read.
my question is whether there is some 'gotcha' or 'catch' to reading an mnesia table using ets vs mnesia (there must be, otherwise there is no reason for the slower mnesia read to exist). if it makes any difference, i don't need and am not using any "distrubuted" or "nodes." in other words, i am and will only be using a single node on a single computer.
mnesia:dirty_read does a rpc call even if the table is local. Also it checks for the current activity context and maintains it even for dirty lookups. This will result in the extra time required for the lookup.
In your case (where there is only one node with local mnesia), direct ets lookup should work but not recommended as it will be implementation dependent. The better would be to use mnesia:ets(Fun,[, Args]).
What strategy does Mnesia use to define which nodes will store replicas of particular table?
Can I force Mnesia to use specific number of replicas for each table? Can this number be changed dynamically?
Are there any sources (besides the source code) with detailed (not just overview) description of Mnesia internal algorithms?
Manual. You're responsible for specifying what is replicated where.
Yes, as above, manually. This can be changed dynamically.
I'm afraid (though may be wrong) that none besides the source code.
In terms of documenation the whole Erlang distribution is hardly the leader
in the software world.
Mnesia does not automatically manage the number of replicas of a given table.
You are responsible for specifying each node that will store a table replica (hence their number). A replica may be then:
stored in memory,
stored on disk,
stored both in memory and on disk,
not stored on that node - in this case the table will be accessible but data will be fetched on demand from some other node(s).
It's possible to reconfigure the replication strategy when the system is running, though to do it dynamically (based on a node-down event for example) you would have to come up with the solution yourself.
The Mnesia system events could be used to discover a situation when a node goes down; given you know what tables were stored on that node you could check the number of their online replicas based on the nodes which were still online and then perform a replication if needed.
I'm not aware of any application/library which already manages this kind of stuff and it seems like a quite an advanced (from my point of view, at least) endeavor to make one.
However, Riak is a database which manages data distribution among it's nodes transparently from the user and is configurable with respect to the options you mentioned. That may be the way to go for you.
Something I not get, I have two mnesia nodes. One has ram and other has disc copy.
My question is:
You do create schema once? But schema is where you enter nodes.
I confused and found not good documentation on this
Let's start by clarifying the concepts. A mnesia cluster consists of nodes and tables; nodes may have copies of the tables. The type of the copy, which may be ram_copies, disc_copies, and disc_only_copies, applies to a given table on a given node. A node may have different types of copies of different tables, and a table may have different types of copies on different nodes. A special case is a node which doesn't have disc based copies at all; it is called a ram only node.
The schema is a special table that stores information about the nodes and tables. Each node must have a copy of this table in the cluster; ram only nodes obviously have a ram copy, other nodes have a disc copy. To be precise, a node must have a disc copy of the schema to have a disc-based copy of any other table.
When you call mnesia:create_schema, you are creating a disc copy of a schema without tables, to be loaded by mnesia when it is started (this function refuses to work if mnesia is already started). If your cluster contains multiple disc-based nodes, the schema is created on all these nodes simultaneously, and when mnesia is started on these nodes, they automatically connect to each other (the nodes know about each other from the schema).
When mnesia cannot load the schema from disk at startup, it creates an empty one for itself in ram (or refuses to start, depending on settings). After that, you can either turn it into a ram-only node by calling mnesia:change_config on a disc-based node of the cluster, in which case the empty schema will be replaced and the node will be synchronized with the rest of the cluster, or you can start creating tables and adding other ram only nodes (which still have an empty schema), building a ram-only cluster.
A ram only node can be turned into a disc node by calling mnesia:chang_table_copy_type on the table schema. This way you can build a complete disc-based cluster dynamically from scratch, without creating a disc-based schema beforehand. However if you have a fixed set of disc nodes, it's much easier to statically initialize the schema on them before starting the cluster the first time.
Is there something like virtual environment or sandboxing for Erlang applications? Is it possible to share nodes between many applications owners knowing that nobody can break another app?
Nodes are the virtual environment for Erlang applications, so you can't just load arbitrary applications into one node and have everything play nice. There are way too many kinds of shared resource to compete for within a node to allow that (module names, registered process names, ETS table names, ...).
However, nodes can very easily communicate with one another more or less transparently, so spinning up a new node for every collection of apps you don't want to manually vet to make sure they work together is fine. You can run more than one app in a node obviously, but you do have to verify yourself that they won't step on each others toes.
It doesn't cost you a whole lot of memory or CPU to run multiple nodes, so I would almost always recommend running different erlang systems (collections of apps that work together) in different nodes even if you have only one physical machine.
I have an application developed with Erlang / MNesia and I am trying to implement redundancy for MNesia.
I want to add - remove nodes dynamically in runtime and handle synchronization of tables for every new joining node.
What is the best way to implement this using Erlang and MNesia?
Thanks.
You don't need to implement anything - mnesia already has these features. You can add and remove nodes from a mnesia cluster at runtime, add and remove table copies from nodes within the cluster, and mnesia:wait_for_tables/2 will let you cope with synchronization while adding nodes or table copies. Have a look at the mnesia documentation for more information.