Mnesia asynchronous transaction - erlang

I would like to have a master-slave setup of Erlang nodes, where read and write operations happen on the master node only. Slave nodes are only kept as hot-standbys.
As I understand the default behavior of Mnesia is to acquire the lock synchronously on all nodes before executing the write operation. This would result in high latency especially for geographically distributed nodes.
My question is: does Mnesia support asynchronous transactions, where locks are only acquired on the master node, and write operations are propagated afterwards towards slave nodes?

I think you will be happier if you build this off-site replication using a message queue system (rabbitmq perhaps) updating the replicated db yourself from the message queue feed. WAN links are more likely to become congested or go down, and message-queue protocols have ways to handle that. Erlang distribution just give up and you have to spill the updates into a file until the replica comes up and can consume it.
For best symmetry, have posting-to-the-message queue be the primary method to update the db. So even the master is updated by consuming from the message queue. If a response is needed, the current master can send a message back to the issuer of the message.
Mnesia does have a few different kinds of mnesia transaction contexts but nothing that really fit exactly with what you want.

Maybe your application can benefit using sticky locks. I guess that it is quite close to your needs, but...not exactly what you wanted http://www.erlang.org/documentation/doc-5.8.3/lib/mnesia-4.4.17/doc/html/Mnesia_chap4.html#id70700

Interesting Q and equally interesting A!
Basically, what you are suggesing, Christian, is e.g. to have a gen_server - serializing the access to the DB.
First time I did that and then I realized: hang on! Mnesia is transactional so it sounds a bit odd to first serialize access and then sort of do it again by updating the DB via a transaction.
I still am a bit puzzled, however, given that mnesia enforces transactional semantics I tend to take that as a hint that you should not have to serialize access yourselves, especially since the implementors of mnesia probably know the system better than I do ;)
I understand that this is not quite a direct answer to your question, however, I'd say use mnesia + memorynodes + disknodes. The memorynodes for quick takeover and the disknodes for recovering after a crash/backup.
HTH,
haavee

Related

What makes erlang scalable?

I am working on an article describing fundamentals of technologies used by scalable systems. I have worked on Erlang before in a self-learning excercise. I have gone through several articles but have not been able to answer the following questions:
What is in the implementation of Erlang that makes it scalable? What makes it able to run concurrent processes more efficiently than technologies like Java?
What is the relation between functional programming and parallelization? With the declarative syntax of Erlang, do we achieve run-time efficiency?
Does process state not make it heavy? If we have thousands of concurrent users and spawn and equal number of processes as gen_server or any other equivalent pattern, each process would maintain a state. With so many processes, will it not be a drain on the RAM?
If a process has to make DB operations and we spawn multiple instances of that process, eventually the DB will become a bottleneck. This happens even if we use traditional models like Apache-PHP. Almost every business application needs DB access. What then do we gain from using Erlang?
How does process restart help? A process crashes when something is wrong in its logic or in the data. OTP allows you to restart a process. If the logic or data does not change, why would the process not crash again and keep crashing always?
Most articles sing praises about Erlang citing its use in Facebook and Whatsapp. I salute Erlang for being scalable, but also want to technically justify its scalability.
Even if I find answers to these queries on an existing link, that will help.
Regards,
Yash
Shortly:
It's unmutable. You have no variables, only terms, tuples and atoms. Program execution can be divided by breakpoint at any place. Fully transactional model.
Processes are even lightweight than .NET threads and isolated.
It's made for communications. Millions of connections? Fully asynchronous? Maximum thread safety? Big cross-platform environment, which built only for one purpose — scale&communicate? It's all Ericsson language — first in this sphere.
You can choose some impersonators like F#, Scala/Akka, Haskell — they are trying to copy features from Erlang, but only Erlang born from and born for only one purpose — telecom.
Answers to other questions you can find on erlang.com and I'm suggesting you to visit handbook. Erlang built for other aims, so it's not for every task, and if you asking about awful things like php, Erlang will not be your language.
I'm no Erlang developer (yet) but from what I have read about it some of the features that makes it very scalable is that Erlang has its own lightweight processes that are using message passing to communicate with each other. Because of this there is no such thing as shared state and locking which is the case when using for example a multi threaded Java application.
Another difference compared to Java is that the Erlang VM does garbage collection on every little process that is running which does not take any time at all compared to Java which does garbage collection only per VM.
If you get problem with bottlenecks from database connection you could start by using a database pooling app running against maybe a replicated PostgreSQL cluster or if you still have bottlenecks use a multi replicated NoSQL setup with Mnesia, Riak or CouchDB.
I think process restarts can be very useful when you are experiencing rare bugs that only appear randomly and only when specific criteria is fulfilled. Bugs that cause the application to crash as soon as you restart the app should optimally be fixed or taken care of with a circuit breaker so that it does not spread further.
Here is one way process restart helps. By not having to deal with all possible error cases. Say you have a program that divides numbers. Some guy enters a zero to divide by. Instead of checking for that possible error (and tons more), just code the "happy case" and let process crash when he enters 3/0. It just restarts, and he can figure out what he did wrong.
You an extend this into an infinite number of situations (attempting to read from a non-existent file because the user misspelled it, etc).
The big reason for process restart being valuable is that not every error happens every time, and checking that it worked is verbose.
Error handling is verbose typically, so writing it interspersed with the logic handling doing a task can make it harder to understand the code. Moving that logic outside of the task allows you to more clearly distinguish between "doing things" code, and "it broke" code. You just let the thing that had a problem fail, and handle it as needed by a supervising party.
Since most errors don't mean that the entire program must stop, only that that particular thing isn't working right, by just restarting the part that broke, you can keep operating in a state of degraded functionality, instead of being down, while you repair the problem.
It should also be noted that the failure recovery is bounded. You have to lay out the limits for how much failure in a certain period of time is too much. If you exceed that limit, the failure propagates to another level of supervision. Each restart includes doing any needed process initialization, which is sometimes enough to fix the problem. For example, in dev, I've accidentally deleted a database file associated with a process. The crashes cascaded up to the level where the file was first created, at which point the problem rectified itself, and everything carried on.

One replicated mnesia table has become out-of-sync

I have an erlang application currently running on four nodes with a replicated mnesia db that stores minimal data regarding connected clients. The mnesia replication has been working seamlessly in the past (as far as I know anyway) but a client recently noticed that one of the nodes is missing some ids related to his application.
I'm not really sure how this happened. Our network may have had a hiccup at the time. Maybe? But, of more urgency at the moment is getting the data into a good state across all nodes. Is there a way to tell mnesia to replicate from a known-good node?
Mnesia is legendary about this issue. It's a huge PITA.
Looking at it from CAP theorem's point of view, most systems built with Mnesia end up being C-A (consistency-availability with no partition tolerance) systems. For most of the time you have (and heavily rely on) its hard consistency. Then a network partition happens...
It's still available for writes, but these writes destroy consistency. And later on, Mnesia has no mechanism for automatic data repair.
Everyone who uses Mnesia in a cluster should familiarize themselves with these tradeoffs. Your problem is a clear sign that using Mnesia was a poor choice. Double so if this data is critical to you.
I too use Mnesia in such a way (sometimes we all need speed you know). But I make sure to only use it to store data that I can easily reconstruct. In general, if you need it stored on disk, Mnesia is no good, except for toy projects.
I make sure to always have this function at hand:
reinit_mnesia_cluster() ->
rpc:multicall(mnesia, stop, []),
AllNodes = [node() | nodes()],
mnesia:delete_schema(AllNodes),
mnesia:create_schema(AllNodes),
rpc:multicall(mnesia, start, []).
Use it only after the network partition has been resolved and all nodes are reachable. This will erase all Mnesia replicas and start it anew. Again, if you can't live with what it does, then using Mnesia was a poor choice.
For important data that needs hard consistency, use SQL. For important data that needs availability, use Riak. For shared state that needs speed, use Redis. Mnesia is no replacement for these systems, although at first it does seem so.
Edit on 2014-11-16: Here is a much better article on the topic, explaining in detail what I said above https://medium.com/#jlouis666/mnesia-and-cap-d2673a92850
Honestly, I think the cleanest way to get an out-of-sync Mnesia to replicate from a known good node is to shut down the application on the bad node, and delete all its Mnesia database files, then do the following.
Write an escript that starts Mnesia up standalone using the "bad" node name and Mnesia directory, replicates the tables from a known good node, and shuts Mnesia down. Run that escript on the bad node.
The act of replicating the tables and shutting Mnesia down gracefully puts the node back in sync with the cluster. Then, when you start the application up on the bad node, it will join up and stay in sync with the cluster.
Of course, this description lacks precise details, but that's the gist of it. There are surely less brute force ways of doing this, but unless you have massive amounts of data to replicate, I think this way is the quickest and cleanest.

Mnesia Replication and Large Numbers of Dirty Operations

Some applications require really fast response, to meet their expectations to users. I am building one such application and i am using mnesia. Now, when we by-pass the mnesia transaction manager , we approach good performance. However, this is the problem: We need to replicate this database as part of load balancing, after-all, mnesia does the replication for us. We are using ONLY dirty operations in this application. We have a few parts using async_dirty context. I am wondering, would mnesia replication be affected if we are not using the transaction context at this scale ? Too many frequent dirty operations are occuring on records all the time, so i wonder if a request made on side B replica, would find the changes the have just been made by side A replica via a dirty operation ?
According to Mnesia User's Guide:
async_dirty activities "will wait for the operation to be performed on one node but not the others".
For sync_dirty activities: "The caller will wait for the updates to be performed on all active replicas".

Is it a good idea to use MQ to store data in DB?

I'm going to use rabbitMQ as a message broker and switch most of the scripts to sending data to queue instead of performing direct writes/reads. Consumer will get those messages and perform corresponding operations. In my dreams this will give me more flexibility choosing DB engine, app level sharding and so on. But is it a good idea generally? Or am I missing something? Current write load is ~15k inserts/deletes for mysql and 30-50k sets for redis instances. Read load is the same ~15-20k selects, and 50-70k gets for redis.
The biggest issue you'll face will be the fact that your DB writes will be asynchronously processed. If a client writes data to the DB and then instantly reads it back, the value might not be what it originally inserted because the Rabbit queue might have been very busy or slow, delaying the update operation. Or an admin might accidentally purge your queue and then you'll have all these clients thinking their transactions had been committed but nothing will have been stored.
This sounds like a classic case of premature optimization. It's a solution in search of a problem, and you should probably avoid doing it.
With amqp you can run a none asynchronous operations using a RPC way, with this kind of architecture you should figure out all problems related with asynchronous operations.

Postgresql replication in rails with data-fabric gem

I am currently setting up a master-slave app using Ruby on Rails. I am planning to use data-fabric or octopus gem for handling the read/write connections.
This is my first time setting up master-slave DBs. I am confused over the various open source tools available to implement the postgresql replication e.g. pgpool II, pgcluster, Bucardo and Hot Standby/Streaming Replication (built in feature in postgresql 9.1)
My requirements are
fault tolerance(high availability and no data loss on failover)
load balancing
Thanks in advance
Note: I have gone through the stackoverflow post regarding postgresql replication but they are pretty old and not helping to conclude on which tool I should go with.
In your case, streaming replication is the place to start. It is not very flexible but it does what you need regarding database reads as long as you don't need to replicate between major versions.
Database Replication 101
Database replication is a way to ensure that data saved to a specific server becomes stored in a number of other servers. This is often done to better utilize more limited network connections, ensure fault tolerance (so there is essentially a hot back-up), ensure that read-only queries can be distributed over a larger number of databases, etc. This all must be done without sacrificing the the basic guarantees of ACID.
There are a number of different overlapping ways to categorize replication solutions. These include:
Page or file-level vs row-level vs statement-level
Synchronous vs Asynchronous
Master-slave vs Multi-Master
In general understanding replication and the tradeoffs between solutions requires relatively strong understanding of database mechanics and ACID guarantees. I will assume you are relatively familiar with storage mechanics, and deterministic vs non-deterministic operations and the like.
What is Being Replicated? File changes (Physical) vs Row Changes (Logical) vs Statements
The simplest approach is to replicate block changes to files, for example as stored in the write-ahead log in PostgreSQL. This replicates changes at the page level and it requires identical file formats. This means you cannot replicate across major versions, CPU architectures, or operating systems. Anything that could affect the alignment of tuples, for example, will cause the replication to either fail or, worse, corrupt the slave's database. This is the approach streaming replication uses. It is simple to set up, and it always replicates everything in the database cluster.
Additionally this approach means you can easily guarantee that the master and slave databases are identical down to the file level. Because of the fact that the PostgreSQL WAL is cluster-global it is unlikely that this approach will ever replicate anything short of the entire database cluster.
As a description of how this works, suppose I:
UPDATE my_table SET rand_value = random() WHERE id > 10000;
In this case, this changes a bunch of data pages and the file operations are replicated to the replicas. The files remain identical between the master and slave.
Another approach, one taken by Slony, Bucardo, and others is to replicate rows in a logical manner. In this approach, changed rows are flagged and logged, and the changes sent to the replicas. The replicas re-run row operations from the master database. Because these are add-on tools which do not replicate file operations but rather logical database operations, they can replicate across CPU architectures, operating systems, etc. Also they are usually designed so that you can replicate some but not all tables in a database, allowing for a lot of flexibility. On the other hand this leads to a lot of potential for errors. "Oops, that table was not replicated" is a real problem.
In this case when I run the update statement above, a trigger is fired capturing the actual rows inserted and deleted and these are logged, replicated, and the row operations re-run. Because this happens after rand() is run, the databases are logically, but not necessarily physically identical.
A final approach is statement replication. In this case we replicate statements and re-run the statements on the replicas. Some configurations of PgPool will do this. In this case, you cannot ensure that a database is logically equivalent to its replica if any non-deterministic functions are run. In the statement above, the statement itself will run on each replica, ensuring different pseudorandom numbers in the relevant column.
Synchronous vs Asynchronous
This distinction is important to understand regarding failover guarantees. In an asynchronous replication system, the updates are queued and transferred when possible to the replicas and re-run there. In a synchronous replication system the database which accepts the write will not return a successful commit until at least a certain number of replica databases report a successful commit.
Asynchronous replication is generally more robust and produces better availability than synchronous replication. This is because synchronous replication introduces additional points of failure. If you have one master and one slave, then if either system goes down, your database becomes unavailable at least for write operations.
The tradeoff though is that synchronous replication offers a guarantee that data which is committed is in fact available on replicas in the event that the master, say, suffers catastrophic hardware failure immediately following commit. This is a very low probability event, but in some cases it is important that you know the data is still available. In short this provides additional durability guarantees not present in async replication.
Multi-Master vs Master-Slave
Most replication systems are master-slave. In this case, all writes begin at one node and are replicated to other nodes. Writes may only begin at one node. They may not begin at other nodes. This makes replication straight-forward because we know that the slaves represent a past state of the master.
Multi-master replication allows writes to occur to more than one node. In an asynchronous replication system, this leads to the problem of conflict resolution. These problems are actually worse than most assume when you add DDL statements. Suppose two different users run the above update statement on two different masters. We will now have a set of records that have to be replicated across but they will conflict.
Multi-master replication typically requires that people think through this conflict resolution process quite carefully. It is never a process that just works out of the box. Often times you write your own conflict resolution routines. For this reason I typically recommend avoiding multi-master replication unless you really need it.

Resources