I am only interested in atomic transactions and strong consistency. Does firebase realtime db supports both?
I don't see any transaction locks in firebase db anywhere. And, it is required to take lock on item to support atomicity. That's my first thought that probably firebase db is not an atomic db.
I also don't know the back-end atchitecture of firebase db. I am not sure if it always read from master node or slave nodes too. So can't not ensure whether it is strongly consistent or eventual consistent.
Realtime Database supports transactions. Clients must all agree how to cooperate with respect to these transactions. The database doesn't support any sort of operation that locks the entire database in order to serialize access from all client. You need to understand how RTDB transactions work in order to make effective use of them. Not all writes will require a transaction, and you need to figure out for yourself when and how best to use them in your particular application.
Since Realtime Database is a cloud-hosted database, you don't need to know (or care) about any sort of master/slave configuration. In fact, you can just assume that it works as the documentation suggests. The documentation suggests that it's eventually consistent if the client is offline at the time of a write operation (which will be cached locally and synchronized when it becomes online). It's immediately consistent if the client is already online, and the client is willing to "wait around" for the latest update as it listens to changing data in the database, whenever it becomes available to the client. (There are actually no "replicas" to speak of with Realtime Database, except the local caches that each client may maintain for themselves for data previously read.)
Related
I have an existing system that uses a relational DBMS. I am unable to use a NoSQL database for various internal reasons.
The system is to get some microservices that will be deployed using Kubernetes and Docker with the intention to do rolling upgrades to reduce downtime. The back end data layer will use the existing relational DBMS. The micro services will follow good practice and "own" their data store on the DBMS. The one big issue with this seems to be how to deal with managing the structure of the database across this. I have done my research:
https://blog.philipphauer.de/databases-challenge-continuous-delivery/
http://www.grahambrooks.com/continuous%20delivery/continuous%20deployment/zero%20down-time/2013/08/29/zero-down-time-relational-databases.html
http://blog.dixo.net/2015/02/blue-turquoise-green-deployment/
https://spring.io/blog/2016/05/31/zero-downtime-deployment-with-a-database
https://www.rainforestqa.com/blog/2014-06-27-zero-downtime-database-migrations/
All of the discussions seem to stop around the point of adding/removing columns and data migration. There is no discussion of how to manage stored procedures, views, triggers etc.
The application is written in .NET Full and .NET Core with Entity Framework as the ORM.
Has anyone got any insights on how to do continious delivery using a relational DBMS where it is a full production system? Is it back to the drawing board here? In as much that using a relational DBMS is "too hard" for rolling updates?
PS. Even though this is a continious delivery problem I have also tagged with Kubernetes and Docker as that will be the underlying tech in use for the orchestration/container side of things.
All of the following under the assumption that I understand correctly what you mean by "rolling updates" and what its consequences are.
It has very little (as in : nothing at all) to do with "relational DBMS". Flatfiles holding XML will make you face the exact same problem. Your "rolling update" will inevitably cause (hopefully brief) periods of time during which your server-side components (e.g. the db) must interact with "version 0" as well as with "version -1" of (the client-side components of) your system.
Here "compatibility theory" (*) steps in. A "working system" is a system in which the set of offered services is a superset (perhaps a proper superset) of the set of required services. So backward compatibility is guaranteed if "services offered" is never ever reduced and "services required" is never extended. However, the latter is typically what always happens when the current "version 0" is moved to "-1" and a new "current version 0" is added to the mix. So the conclusion is that "rolling updates" are theoretically doable as long as the "services" offered on server side are only ever extended, and always in such a way as to be, and always remain, a superset of the services required on (any version currently in use on) the client side.
"Services" here is to be interpreted as something very abstract. It might refer to a guarantee to the effect that, say, if column X in this row of this table has value Y then I will find another row in that other table using a key computed such-and-so, and that other row might be guaranteed to have column values satisfying this-or-that condition.
If that "guarantee" is introduced as an expectation (i.e. requirement) on (new version of) client side, you must do something on server side to comply. If that "guarantee" is currently offered but a contradicting guarantee is introduced as an expectation on (new version of) client side, then your rolling update scenario has by definition become inachievable.
(*) http://davidbau.com/archives/2003/12/01/theory_of_compatibility_part_1.html
There are also parts 2 and 3.
I work in an environment that achieves continuous delivery. We use MySQL.
We apply schema changes with minimal interruption by using pt-online-schema-change. One could also use gh-ost.
Adding a column can be done at any time if the application code can work with the extra column in place. For example, it's a good rule to avoid implicit columns like SELECT * or INSERT with no columns-list clause. Dropping a column can be done after the app code no longer references that column. Renaming a column is trickier to do without coordinating an app release, and in this case you may have to do two schema changes, one to add the new column and a later one to drop the old column after the app is known not to reference the old column.
We do upgrades and maintenance on database servers by using redundancy. Every database master has a replica, and the two instances are configured in master-master (circular) replication. So one is active and the other is passive. Applications are allowed to connect only to the active instance. The passive instance can be restarted, upgraded, etc.
We can switch the active instance in under 1 second by changing an internal CNAME, and updating the read_only option in each MySQL instance.
Database connections are terminated during this switch. Apps are required to detect a dropped connection and reconnect to the CNAME. This way the app is always connected to the active MySQL instance, freeing the passive instance for maintenance.
MySQL replication is asynchronous, so an instance can be brought down and back up, and it can resume replicating changes and generally catches up quickly. As long as its master keeps the binary logs needed. If the replica is down for longer than the binary log expiration, then it loses its place and must be reinitialized from a backup of the active instance.
Re comments:
how is the data access code versioned? ie v1 of app talking to v2 of DB?
That's up to each app developer team. I believe most are doing continual releases, not versions.
How are SP's, UDF's, Triggers etc dealt with?
No app is using any of those.
Stored routines in MySQL are really more of a liability than a feature. No support for packages or libraries of routines, no compiler, no debugger, bad scalability, and the SP language is unfamiliar and poorly documented. I don't recommend using stored routines in MySQL, even though it's common in Oracle/Microsoft database development practices.
Triggers are not allowed in our environment, because pt-online-schema-change needs to create its own triggers.
MySQL UDFs are compiled C/C++ code that has to be installed on the database server as a shared library. I have never heard of any company who used UDFs in production with MySQL. There is too a high risk that a bug in your C code could crash the whole MySQL server process. In our environment, app developers are not allowed access to the database servers for SOX compliance reasons, so they wouldn't be able to install UDFs anyway.
how safe it is to use Resque for long-time-pending tasks in Rails (for example: Resque.enqueue_in(7.days, JobClass)) with resque and resque-scheduler gems? Good to mention that instance is running on Heroku. What is the chance of losing queue?
Semi Persistence
As per my comments, the downside of using Redis (which Resque uses) is the potentiality of downtime.
As the "RAM" of web-apps, Redis stores semi persistent data. This basically means it only retains your data whilst it is operating. Any downtime would lose your queue.
This is the crux of your question -
can you afford to lose your queue?
The answer is only one which you can answer.
There are, however, a number of factors which I would consider:
*Redis isn't a full database... it's a JSON key:value pair store.
In short, this should only be used to give you a simple mechanism to use snippets of data at particular times. We use Redis to store ID's of users on one of our systems, as an example.
This means that if you're storing more data inside Redis than simple ID's or other base data, are you using it correctly?
How Important Is Your Queue?
As with any app, you'll want to retain as much user data as possible.
The difference, however, is how critical the queue is to your system.
I gave the example of a Hotel Reservation system. Perhaps you have "welcome" emails you need to send, "directions" emails to help guests find the hotel, etc. Are these "mission critical"?
I have designed an email marketing system before. The queuing system for that was extensive; but it wasn't the "queue" that was important - it was all the associative data which went with it.
Thus, we saved our data in a datatable and only used Redis to store a pure queue of what was going to be sent, and when. We then ran some scripts every minute (I've forgotten the specifics), which went through the Redis queue.
Bottom line is that I would highly recommend storing the appropriate data in a table, and then only processing the requests as you need them.
The underpin of the queue is that you need to be able to rebuild it if it goes down. If you're storing your values in redis alone, this will prove to be a major bottleneck.
Hitting the DB is well worth the data you'll get from it in the long run.
Client DB - CoreData (iOS)
Server DB - MySQL
I am trying to achieve data synchronisation between client and server but the complicated part is that the schema is highly relational. I was going through couple of synchronisation patterns already in use and looks like most of them are based on a NOSQL or schemaless DB. Wondering if there are any patterns of synch for a highly relational data. I have already gone through couchbase, dropbox sync api, wasabi synch etc. Following are the concerns
1) By highly relational data it means, there are several tables which are related to each other and Create/Update happens on all the tables. Right now I am planning to do seperate CRUD requests for each table. Is that a good approach? But the problem is that there should be a strict ordering of the requests because the changes in table-3 cannot be processed before the table-2 data is received. This relationship is making it hard to synch.
2) Change tracking on the client. what would be the best way to identify the changes in a particular table(CoreData Entity). I am planning for a delta approach where only the changes in similar kind of objects will be uploaded at a time.Any insights /links to it?
3) Data Merging/ConflictResolution - I am stumbled upon this part. 1 way would be to have the modified timestamp in each object, but what if the devices dates are not in sync or manually changed.
I wanted to know the implications/challenges in such a synch pattern with RDBMS backed server or any alternative approaches.
Problem #1 Explained
Assume there are 10 tables and APIs expose CRUD requests for these 10 tables. 1 Request will do only C/R/U/D of any one table. So my question was is this a good approach to design APIs like this when it comes to offline syncing of data. For e.g. Consider a relational data
Organization->Employee->Department->Project
Assuming some objects of these 4 tables got created offline. Now we need to sync data to server when network is back. So it will be like Create/Update Organisations First, Once it is over Create/Update Employee so that it can be linked to Organisation. So basically everytime a C/U/D will be issued from the top->bottom level objects. So my question is whether this is good approach in a Sync Problem. Because if the data was not relational we could have uploaded the changes in all the tables in a single C/U/D API Call.
It seems that you might not be aware of typical relational DBMS facilities and protocols that
support simultaneous write access by multiple sessions, making them suitable for multi-user, highly concurrent, and OLTP applications.
1) Your API to access MySQL allows you to make your changes atomically (all or nothing) via a transaction. Within that transaction you should update as many tables as possible simultaneously but can sequence such changes as necessary. By locking tables as you use them then unlocking in the reverse order you avoid deadlocks. You can request that only parts of tables that a transaction knows it could possibly change are locked so that non-overlapping clients can proceed concurrently.
2) Your schema can explicitly record redundant delta information that you get the DBMS to calculate on updates, or it can record sufficient past changes to calculate deltas on request. Your client can give the DBMS its transaction data and the DBMS can return relevant info based on it and the past. You probably do not need to and should keep any persistent state on your client. That is what the server database is for. The client database is a buffer for it and user info.
3) You can use an explicit client serial transaction id so that client plus id indicates what order the client thinks its transactions were sent regardless of its clock.
I wonder how much you have googled.
This is just a general question, not too technical. We have this use-case wherein we are to load hundreds of thousands of records to an existing Neo4j database. Now, we cannot afford to make the database offline because of users who are accessing it. I know that Neo4j requires exclusive lock on the database while it's performing batch updates. Is there a way around my problem? I don't want to lock my database while doing updates. I still want my users to access it - even for just read-only access. Thanks.
Neo4j never requires exclusive lock on the database. It selectively locks portions of the graph that are affected by mutating operations. So there are some things you can do to achieve your goal. Are you a Neo4j Enterprise customer?
Option 1: If so, you can run your batch insert on the master node and route users to slaves for reading.
Option 2: Alternatively, you could do a "blue-green" style deployment where you:
take a backup (B) of your existing database (A), then mark the A database read-only
apply your batch inserts onto B either by starting a separate instance, or even better, using BatchInserters. That way, you'll insert your hundreds of thousands in a few seconds
start the new database B
flip a switch on a load-balancer, so that users start to be routed to the B instead of A
take A down
(Please let me know if you need some tips how to make a read-only DB.)
Option 3: If you can only afford to run one instance at any one time, then there are techniques you can employ to let your users access the database as usual and still insert large volumes of data. One of them could be using a single-threaded "writer" with a queue that batches write operations. Because one thread only ever writes to the database, you never run into deadlock scenarios and people can happily read from the database. For option 3, I suggest using GraphAware Writer.
I've assumed you are not trying to insert hundreds of thousands of nodes to a running Neo4j database using Cypher. If you are, I would start there and change it to use Java APIs or the BatchInserter API.
My team is currently building a new SaaS application for our company (Amilia.com). We are in "alpha" release and the application was built to be deployed on a web farm.
For our session provider, we are using Sql Server mode (in DEV and TEST) and it seems to be not "scalable", hence we are looking for the best solution for handling sessions in asp.net (mvc3 in our case). We are currently using Sql Server but we would like to switch to an other system due to license cost.
We target 20 000 [EDITED, was 100k before] concurrent users. In session, we store a GUID, a string and a Cart object (we try to keep it as little as possible, this object allows us to save 3 queries at each request).
Here are the different solutions I've found :
ASP.NET built-in solutions:
No session : impossible in our case (eliminated)
In-Proc Mode : can't be used in a webfarm. (eliminated)
StateServer Mode : can be used in a webfarm but if the server goes down, I lose all my sessions. (eliminated)
StateServer Mode with a PartitionResolver using multiple servers (http://msdn.microsoft.com/en-ca/magazine/cc163730.aspx#S8) If I undestand well, if one of these servers goes down, only a part of my users will lose their session.
SqlServer Mode : can be used in a webfarm, if the server goes down, I can recover my sessions but the process is quite slow. Moreover, that database becomes a bottleneck in case of heavy load.
SqlServer Mode with a PartitionResolver using multiple servers (http://www.bulletproofideas.net/2011/01/true-scale-out-model-for-aspnet-session.html) : If one of these servers goes down, only a part of my users will lose their session. If the user was doing nothing between the downtime, he will recover his previous session otherwise he will be redirected to the signin screen.
Custom solutions :
Use MongoDB as Session storage (http://www.adathedev.co.uk/2011/05/mongodb-aspnet-session-state-store.html) It seems to be a good tradeoff but my knowledge in nosql is quite rudimentary so I cannot see the cons.
Use Memcached : the problem will be the same as StateServer mode and if the memcached server goes down, all my sessions are lost. Furthermore, I think Memcached is not dedicated to store session state ?
Use distributed memcached like ScaleOut (http://highscalability.com/product-scaleout-stateserver-memcached-steroids) : seems to be the best solution but it costs money.
Use repcached and memcached (http://repcached.lab.klab.org/), I've never seen an implementation of that solution.
We could easily go to Ms Azure and use tools provided by it but we have only one application, so if Microsoft doubles the price, we immediately double our infrastructure cost (but that's another subject).
So, what's the best way or at least what's your opinion about this ?
SQL Server session is pretty good. Since you already have a SQL Server database to store your primary data, you can just create another database and store the ASP.NET Session there.
About the scalability, I would say if you have 100,000 concurrent users, then your userbase must be more than 10 millions or more. You should do some practical estimate to see really how long it will take to reach such a concurrent user load. In my previous startup, we had millions of users all around the world, 24x7, but we hardly ever reached 10K concurrent users even though people used our site continuously for hours every day.
If you really have 100,000 concurrent users, license cost would be the least of your worry. With right business model, having 100K concurrent users means you have at least $10M revenue/year.
I have built myoffice.bt.com that uses SQL Server session and all primary data on a single SQL Server instance, but in two databases. Between 8 AM to 10 AM, millions of users hit our site. We hardly have any performance issue. With a dual Core server, 8 GB RAM, you can happily run a SQL Server instance and support such a load as long as you code it right. It all depends on how you have coded. If you have followed performance best practices, you can easily scale to millions of users on a single database server.
Take a look at my performance suggestions from:
http://omaralzabir.com/tag/performance/
I have used memcached clusters only to cache frequently used data. Never used for session for good reasons. There's been several occasions where a memcached server had to be rebooted. If we had used memcached for session, we would have lost all the sessions stored in that instance. So, I would not recommend storing sessions in memcached. But then again, how important is it for your app to maintain data in session? If you have a shopping cart, then as users add products on the cart, it must get persisted in database, not in session. Session is usually for short term storage. For any transactional data, you should never keep it on session, instead store it on relational tables directly.
I am always in support of not using Session. Developers abuse session all the time. Whenever they want to pass data from one page to another, they just put it on the Session. It results in bad design. If you truly want to scale to 100K concurrent user base, design your app to not use session at all. Any transactional data must be stored in database. Cart is a transactional object and thus it's not suitable for holding on Session. At some point you would need to know how many carts get started but never gets placed. So, you will need to store them in database permanently.
Remember, database based session is nothing but databased based serialization. Think very carefully on what you are serializing into database. You will have to clean it up as well since Session_End won't fire for database based session or in fact most of the out of proc sessions. So, essentially you are giving devs ability to just serialize data into database and bypass relational model. It always results in bad coding.
With permanent relational storage, fronted by a high performance cache like memcached, you have much better design to support large user base.
Hope this helps your concerns.