multiple micro services sharing same database SERVER (One container, multiple dabases) - docker

Is it good practice to have multiple services connect to the same database server but each having their own database.
I guess having one Postgres instance is better than each service/container having their own instance.
My question is that should each service:
run in their own container with db server instance in the same container
run in their own container and the db for that service run on a separate container just for the db (multiple db containers/one per service)
one db SERVER, multiple databases, one container for all databases and all services connect this container/db server
I understand that each service should have their own db, but does that also means they should be completely decoupled even server wise.
I guess the reason I want to have one db SERVER is so that resources are not "wasted" as multiple instance of db server running
I also understand that having one server will mean that all services will coupled hardware wise

It doesn’t really matter. Modern infrastructures tend not to care about the overhead of running multiple copies of the same service. Since database I/O can often be a critical performance point, you might find it more manageable to not share a database, so that you can put databases under heavier load on dedicated and/or larger hardware.
(Also consider running your database(s) on dedicated hardware, not under Docker: they’re the one thing you must back up, and you’ll update them much less often than the rest of your application stack, so their lifecycle is fundamentally different from a disposable Docker container. If you’re using a public cloud service that offers a managed database service and are willing to pay for it, that can also be a very reasonable option.)
Whatever you decide, you almost definitely need to make all of the parameters (host, database, username, password) configurable, usually via environment variables. (I see too many SO questions that have host names hard-coded in source code.) You should be able to deploy the same image with different options in development, test, and production environments, which will generally have different host names.

Yes it is ok to have multiple services on the same server.
The deployment configuration should be guided by your operational needs (cost, performance monitoring etc.) as long as the services are decoupled you'd have the freedom to move the data around based on your operational needs

Well, you can consider these:
1) If your service doesn't care about your database server, I mean like, for Service A it must be MongoDB, Service B must be Ms. SQL Server, etc.
Then you can go with this setup:
One Database Server -> with, Multiple Databases -> where, One Database for each Service.
2) But, if you find it does matter, as I describe below:
Service A -> Postgres
Service B -> Postgres
Service C -> Ms. SQL Server
Service D -> MongoDB
then you will have 3 database servers where Service A & B will share the same Postgres database server (containing 2 databases, one for Service A, one for Service B) and Service C will have 1 Ms. SQL Server (containing, 1 database) also Service D MongoDB server (containing 1 database).
Usually, you will find this case while you are working with different teams (which handle each service) and each team decides on their own choices.

Related

Can I have some keyspaces replicated to some nodes?

I am trying to build multiple API for which I want to store the data with Cassandra. I am designing it as if I would have multiple hosts but, the hosts I envisioned would be of two types: trusted and non-trusted.
Because of that I have certain data which I don't want to end up replicated on a group of the hosts but the rest of the data to be replicated everywhere.
I considered simply making a node for public data and one for protected data but that would require the trusted hosts to run two nodes and it would also complicate the way the API interacts with the data.
I am building it in a docker container also, I expect that there will be frequent node creation/destruction both trusted and not trusted.
I want to know if it is possible to use keyspaces in order to achieve my required replication strategy.
You could have two Datacenters one having your public data and the other the private data. You can configure keyspace replication to only replicate that data to one (or both) DCs. See the docs on replication for NetworkTopologyStrategy
However there are security concerns here since all the nodes need to be able to reach one another via the gossip protocol and also your client applications might need to contact both DCs for different reads and writes.
I would suggest you look into configuring security perhaps SSL for starters and then perhaps internal authentication. Note Kerberos is also supported but this might be too complex for what you need at least now.
You may also consider taking a look at the firewall docs to see what ports are used between nodes and from clients so you know which ones to lock down.
Finally as the above poster mentions, the destruction / creation of nodes too often is not good practice. Cassandra is designed to be able to grow / shrink your cluster while running, but it can be a costly operation as it involves not only streaming data from / to the node being removed / added but also other nodes shuffling around token ranges to rebalance.
You can run nodes in docker containers, however note you need to take care not to do things like several containers all accessing the same physical resources. Cassandra is quite sensitive to io latency for example, several containers sharing the same physical disk might render performance problems.
In short: no you can't.
All nodes in a cassandra cluster from a complete ring where your data will be distributed with your selected partitioner.
You can have multiple keyspaces and authentication and authorziation within cassandra and split your trusted and untrusted data into different keyspaces. Or you an go with two clusters for splitting your data.
From my experience you also should not try to create and destroy cassandra nodes as your usual daily business. Adding and removing nodes is costly and needs to be monitored as your cluster needs to maintain repliaction and so on. So it might be good to split cassandra clusters from your api nodes.

micro service architecture database backup and restore

I'm working on a big project, which is based on micro service architecture , so consider I have 10 service which some of them have their own database,
these databases are in different technologies (mysql, mongodb , elastic, ... )
so what is the best practice for backup and restore collection of services?
the real problem is these databases are related to each other, for example in my logic backend server I keep oauhId of each user which comes from oauth server,
now consider restore these two databases separately and now my users db in logic server contains some users which there aren't any related records to them on oauth server,
just for your information, I'm using docker , docker-compose, docker swarm for my service orchestration.
As an idea: check how your services depend on each other. If your dependencies are acyclic, you might be able to backup all your data outside-in or inside-out, without running into consistency issues.
Doing so would guarantee you to have no elements in services depending on an inner one after your restore.
If your services show cyclic dependencies, you might be better serviced to have each service redundantly (e.g. master slave replication). Then you can take down the slave instances, taking a backup from the whole lot of slaves while they are offline. That would allow you to create an atomic backup accross all services. However your quality of the backup is then based on the quality of your master slave replication at each service.
Lastly you could keep record of change per service, plus a full backup. Thus you can write your rollback and the start applying the record of change until you reach a consistent state accross the service instances. I think that requires you to have logical dependencies (request identifier) that allows you to correlate the record of change elements (i.e. apply them across the services without the risk to apply them in a way that defies the logical dependencies that occured when clients actually interacted with your services).
I hope these ideas can help you solve your problem :)

MongoDB: Different applications connecting to different replicas

We use Mongodb as the central Database for our application; a consumer facing mobile app. At present its a 7-member replica-set with replica-set-1 being the master at the moment. The backend which connects to the mongo replica is build in Ruby on Rails and we use mongoid as the ODM.
There are mainly 3 pieces connecting to the MongoDB replica-set.
The consumer application
The Admin and customer care management application
The Data retrieval application ( for analytics and such purposes )
All these 3 apps connect to the same replica set as of now.
What I would like to know is whether is it possible to connect different applications to specific replicas.
For example, the mobile app connects to the primary for writes and the replicas 2-4
to read; the customer care management application connects to the primary
( for writes ) and replicas 5-7 for reads.
I dont think explicitly mentioning specific replicas in the mongoid.yml configuration is working. Even though I have already mentioned only replica-set-7 in the mongoid hosts file for the data retrieval application, I do see certain queries in the log file of replica-set-2 and 3.
So obviously, MongoDB decides the criteria to distribute the queries among its replicas despite the configuration specified at the client mongoid end.
I would really love to know if such a thing is possible at all using MongoDb and mongoid as it would help us solve a lot of our load issues. Right now heavy queries from the customer care and data retrieval apps affects the consumer facing mobile app as well; as the reads are not segragated. So basically would like to separate out the reads.
Also, if at all this is possible, I would again have my eyes raised on any possible pitfalls for this; specially that all 3 applications can write to the DB. For example, replica-3 suddenly becomes the primary after an election and its not explicitly mentioned in the configuration of the data retrieval application. What might happen there would become a concern.
I am not at all sure whether this is possible; but just wanted to know if theres a way to figure out this. Any help would be really appreciable.
When you connect to any member of a replica set, the client is told the full state of the replica set and can connect to any of them. The initial set of hosts are just the seeds for that process - as long as your application can reach one of those hosts, it doesn't matter which hosts are in that configuration.
Mongo does have the concept of tagged replica set members. When creating a connection or executing a query you can specify the tags to use to select the replica set member to read from.

FoundationDB, the layer: Is it hosted on client application or server nodes?

Recently I was reading about concept of layers in FoundationDB. I like their idea, the decomposition of storage from one side and access to it from other.
There are some unclear points regarding implementation of the layers. Especially how they communicate with the storage engine. There are two possible answers: they are parts of server nodes and communicate with the storage by fast native API calls (e.g. as linked modules hosted in the server process) -OR- hosted inside client application and communicate through network protocol. For example, the SQL layer of many RDBMS is hosted on the server. And how are things with FoundationDB?
PS: These two cases are different from the performance view, especially when the clinent-server communication is high-latency.
To expand on what Eonil said: the answer rests on the distinction between two different sense of "client" and "server".
Layers are not run within the database server processes. They use the FDB client API to make requests of the database, and do not (with one exception*) get to pierce the transactional key-value abstraction.
However, there is nothing stopping your from running the layers on the same physical (or virtual) server machines as the database server processes. And, as that post from the community site mentions, there are use cases where you might very much wish to do this in order to minimize latencies.
*The exception is the Locality API, which is mostly useful in exactly those cases where you want to co-locate client-side layers with the data on which they operate.
Layers are on top of client-side library feature.
Cited from http://community.foundationdb.com/questions/153/what-layers-do-you-want-to-see-first
That's a good question. One reason that it doesn't always make sense
to run layers on the server is that in a distributed database, that
data is scattered--the servers themselves are a network hop away from
a random piece of data, just like the client.
Of course, for something like an analytics layer which is aware of
what data each server contains, it makes sense to run a distributed
version co-located with each of the machines in the FDB cluster.

offline web application design recommendation

I want to know which is the best architecture to adopt for this case :
I have many shops that connect to a web application developed using Ruby on Rails.
internet is not reachable all the time
The solution was to develop an offline system which requires installing a local copy of the distant database.
All this wad already developed.
Now what I want to do :
Work always on the local copy of the database.
Any change on the local database should be synchronized with distant database.
All the local copies should have the same data in other local copies.
To resolve this problem I thought about using a JMS like software eventually Rabbit MQ.
This consists on pushing any sql request into a JMS queue that will be executed on the distant instance of the application which will insert into the distant DB and push the insert or SQL statement into another queue that will be read by all the local instances. This seems complicated and should slow down the application.
Is there a design or recommendation that I must apply to resolve this kind of problem ?
You can do that but essentially you are developing your own replication engine. Those things can be a bit tricky to get right (what happens if m1 and m3 are executed on replica r1, but m2 isn't?) I wouldn't want to develop something like that unless you are sure you have the resources to make it work.
I would look into existing off-the shelf replication solution. If you are already using a SQL DB it probably has some support for it. Look here for more details if you are using MySQL
Alternatively, if you are willing to explore other backends, I heard that CouchDB has great support for replication. I also heard of people using git libraries to do that sort of thing.
Update: After your comment, I realize you already use MySql replication and are looking for solution for re-syncing the databases after being offline.
Even in that case RabbitMQ doesn't help you at all since it requires constant connection to work, so you are back to square one. Easiest solution would be to just write all the changes (SQL commands) into a text file at a remote location, then when you get connection back copy that file (scp, ftp, emaill or whatever) to master server, run all the commands there and then just resync all the replicas.
Depending on your specific project you may also need to make sure there are no conflicts when running commands from different remote location but there is no general technical solution to this. Again, depending on the project, you may want to cancel one of the transactions, notify the users that it happened and so on.
I would recommend taking a look at CouchDB. It's a non-SQL database that does exactly what you are describing automatically. It's used especially in phone applications that often don't have internet or data connectivity. The idea is that you have a local copy of a CouchDB database and one or more remote CouchDB databases. The CouchDB server then takes care of teh replication of the distributed systems and you always work off your local database. This approach is nice because you don't have to build your own distributed replication engine. For more details I would take a look at the 'Distributed Updates and Replication' section of their documentation.

Resources