We use Mongodb as the central Database for our application; a consumer facing mobile app. At present its a 7-member replica-set with replica-set-1 being the master at the moment. The backend which connects to the mongo replica is build in Ruby on Rails and we use mongoid as the ODM.
There are mainly 3 pieces connecting to the MongoDB replica-set.
The consumer application
The Admin and customer care management application
The Data retrieval application ( for analytics and such purposes )
All these 3 apps connect to the same replica set as of now.
What I would like to know is whether is it possible to connect different applications to specific replicas.
For example, the mobile app connects to the primary for writes and the replicas 2-4
to read; the customer care management application connects to the primary
( for writes ) and replicas 5-7 for reads.
I dont think explicitly mentioning specific replicas in the mongoid.yml configuration is working. Even though I have already mentioned only replica-set-7 in the mongoid hosts file for the data retrieval application, I do see certain queries in the log file of replica-set-2 and 3.
So obviously, MongoDB decides the criteria to distribute the queries among its replicas despite the configuration specified at the client mongoid end.
I would really love to know if such a thing is possible at all using MongoDb and mongoid as it would help us solve a lot of our load issues. Right now heavy queries from the customer care and data retrieval apps affects the consumer facing mobile app as well; as the reads are not segragated. So basically would like to separate out the reads.
Also, if at all this is possible, I would again have my eyes raised on any possible pitfalls for this; specially that all 3 applications can write to the DB. For example, replica-3 suddenly becomes the primary after an election and its not explicitly mentioned in the configuration of the data retrieval application. What might happen there would become a concern.
I am not at all sure whether this is possible; but just wanted to know if theres a way to figure out this. Any help would be really appreciable.
When you connect to any member of a replica set, the client is told the full state of the replica set and can connect to any of them. The initial set of hosts are just the seeds for that process - as long as your application can reach one of those hosts, it doesn't matter which hosts are in that configuration.
Mongo does have the concept of tagged replica set members. When creating a connection or executing a query you can specify the tags to use to select the replica set member to read from.
Related
There is a couple of confusing points in the documentation that make me struggle to understand how exactly distribution across the cluster happens in Orleans. Hence, the questions.
Question #1
Orleans claims to have a built-in distribution capabilities to distribute across multiple servers. To me it sounds that Orleans can act as a load balancer itself and can scale out automatically. Thus, if I deploy Orleans app to several servers, then service discovery and load management should happen automatically, correct?
In this case, why some docs and articles suggest using other tools, like Ocelot or Consul, as a single entry point to Orleans cluster?
Question #2
I would like to use simple but distributed in-memory storage across several servers, like Redis or Apache Ignite, and I would like to know if it's possible to use a simple grain as this kind of a data storage?
Let's say, one grain will store a collection of restaurants and some other grain will keep track of the last 1000 visitors for selected restaurant. Can I activate these 2 grains only once as a singleton collection, add or remove records to each collection, and use these 2 grains as in-memory storage evenly available to all nodes in the cluster? Also, if answer is yes, do I need to add locks to these collections or each grain always exists in a single thread?
Service discovery and load management happen automatically indeed.
Consul is not a strong required. The only external requirement is a Membership table provider - something that is used internally by Orleans Clustering. There are many build in Membership table providers that come already built-in with Orleans. For example, Azure table storage. all you need is to configure Orleans to use it and of course have Azure storage account. Consul is another alternative to Membership table provider and there are more.
Another thing that does not come built-in is infrastructure scaling. If your service demand increases, something need to ask the infrastructure provider (Cloud Provider) to add more Servers. Once servers are added, Orleans will automatically adjust the workload and load balance across the new servers as well. But figuring out that more servers are needed and adding them is not done by Orleans itself (there likely some externally contributed tools to do that. maybe K8 can be configured to do that? I am not completely sure about that).
Yes, you can use those 2 grains as in-memory storage, just like you wrote. And no, you do not need to use locks. All grains are single threaded.
Is it good practice to have multiple services connect to the same database server but each having their own database.
I guess having one Postgres instance is better than each service/container having their own instance.
My question is that should each service:
run in their own container with db server instance in the same container
run in their own container and the db for that service run on a separate container just for the db (multiple db containers/one per service)
one db SERVER, multiple databases, one container for all databases and all services connect this container/db server
I understand that each service should have their own db, but does that also means they should be completely decoupled even server wise.
I guess the reason I want to have one db SERVER is so that resources are not "wasted" as multiple instance of db server running
I also understand that having one server will mean that all services will coupled hardware wise
It doesn’t really matter. Modern infrastructures tend not to care about the overhead of running multiple copies of the same service. Since database I/O can often be a critical performance point, you might find it more manageable to not share a database, so that you can put databases under heavier load on dedicated and/or larger hardware.
(Also consider running your database(s) on dedicated hardware, not under Docker: they’re the one thing you must back up, and you’ll update them much less often than the rest of your application stack, so their lifecycle is fundamentally different from a disposable Docker container. If you’re using a public cloud service that offers a managed database service and are willing to pay for it, that can also be a very reasonable option.)
Whatever you decide, you almost definitely need to make all of the parameters (host, database, username, password) configurable, usually via environment variables. (I see too many SO questions that have host names hard-coded in source code.) You should be able to deploy the same image with different options in development, test, and production environments, which will generally have different host names.
Yes it is ok to have multiple services on the same server.
The deployment configuration should be guided by your operational needs (cost, performance monitoring etc.) as long as the services are decoupled you'd have the freedom to move the data around based on your operational needs
Well, you can consider these:
1) If your service doesn't care about your database server, I mean like, for Service A it must be MongoDB, Service B must be Ms. SQL Server, etc.
Then you can go with this setup:
One Database Server -> with, Multiple Databases -> where, One Database for each Service.
2) But, if you find it does matter, as I describe below:
Service A -> Postgres
Service B -> Postgres
Service C -> Ms. SQL Server
Service D -> MongoDB
then you will have 3 database servers where Service A & B will share the same Postgres database server (containing 2 databases, one for Service A, one for Service B) and Service C will have 1 Ms. SQL Server (containing, 1 database) also Service D MongoDB server (containing 1 database).
Usually, you will find this case while you are working with different teams (which handle each service) and each team decides on their own choices.
I'm working on a big project, which is based on micro service architecture , so consider I have 10 service which some of them have their own database,
these databases are in different technologies (mysql, mongodb , elastic, ... )
so what is the best practice for backup and restore collection of services?
the real problem is these databases are related to each other, for example in my logic backend server I keep oauhId of each user which comes from oauth server,
now consider restore these two databases separately and now my users db in logic server contains some users which there aren't any related records to them on oauth server,
just for your information, I'm using docker , docker-compose, docker swarm for my service orchestration.
As an idea: check how your services depend on each other. If your dependencies are acyclic, you might be able to backup all your data outside-in or inside-out, without running into consistency issues.
Doing so would guarantee you to have no elements in services depending on an inner one after your restore.
If your services show cyclic dependencies, you might be better serviced to have each service redundantly (e.g. master slave replication). Then you can take down the slave instances, taking a backup from the whole lot of slaves while they are offline. That would allow you to create an atomic backup accross all services. However your quality of the backup is then based on the quality of your master slave replication at each service.
Lastly you could keep record of change per service, plus a full backup. Thus you can write your rollback and the start applying the record of change until you reach a consistent state accross the service instances. I think that requires you to have logical dependencies (request identifier) that allows you to correlate the record of change elements (i.e. apply them across the services without the risk to apply them in a way that defies the logical dependencies that occured when clients actually interacted with your services).
I hope these ideas can help you solve your problem :)
I have a grails app running with a single rabbit node. It is great. I want to fire up the same app a second time on the same machine on a different port. Currently, both apps answer jobs from both apps. I want their rabbits to be independent. What is the easiest way to ensure that each app only responds to the messages it sends? Multiple rabbit queues?
You can provide a virtualhost entry in the grails configuration:
rabbitmq.connectionfactory.virtualHost The name of the virtual host to connect to
Define two different vhosts in RabbitMQ, and each grails app will have their very own configured area to use. Messages sent through one vhost will only be available on that vhost, effectively separating the two grails apps without having to change queue setup or other internal parts of each app - just the configuration of the connection.
Remember that access control is performed on a per vhost basis, so you'll have to give your user access to each vhost in rabbitmq.
As #fiskfisk said, multiple vhosts is an option, and would work particularly well if you have a complex set of queues, exchanges, and bindings. There are some downsides to using a new vhost for the second application, including duplication of access control management, as well as some minor performance overhead.
If you have a fairly simple queue/exchange/binding setup, I would suggest pointing the second app at a queue with a different name, or giving your app the ability to be runtime-configured to either use a different queue, or to leverage the topic-based routing within RabbitMQ and have each app flag their messages with an app-specific prefix (or something similar).
One advantage of using topic routing to differentiate apps is that you can easily dip into the full stream of messages and do other things with that stream that you didn't foresee initially, including things like archival logging or audit logging, as well as other metrics collection or analysis.
tl;dr;
For long-term flexibility, have each instance of your application send messages to queues based on topic-routing.
For quick-and-dirty / get-it-working-yesterday, use a separate vhost for each instance of your application.
I want to know which is the best architecture to adopt for this case :
I have many shops that connect to a web application developed using Ruby on Rails.
internet is not reachable all the time
The solution was to develop an offline system which requires installing a local copy of the distant database.
All this wad already developed.
Now what I want to do :
Work always on the local copy of the database.
Any change on the local database should be synchronized with distant database.
All the local copies should have the same data in other local copies.
To resolve this problem I thought about using a JMS like software eventually Rabbit MQ.
This consists on pushing any sql request into a JMS queue that will be executed on the distant instance of the application which will insert into the distant DB and push the insert or SQL statement into another queue that will be read by all the local instances. This seems complicated and should slow down the application.
Is there a design or recommendation that I must apply to resolve this kind of problem ?
You can do that but essentially you are developing your own replication engine. Those things can be a bit tricky to get right (what happens if m1 and m3 are executed on replica r1, but m2 isn't?) I wouldn't want to develop something like that unless you are sure you have the resources to make it work.
I would look into existing off-the shelf replication solution. If you are already using a SQL DB it probably has some support for it. Look here for more details if you are using MySQL
Alternatively, if you are willing to explore other backends, I heard that CouchDB has great support for replication. I also heard of people using git libraries to do that sort of thing.
Update: After your comment, I realize you already use MySql replication and are looking for solution for re-syncing the databases after being offline.
Even in that case RabbitMQ doesn't help you at all since it requires constant connection to work, so you are back to square one. Easiest solution would be to just write all the changes (SQL commands) into a text file at a remote location, then when you get connection back copy that file (scp, ftp, emaill or whatever) to master server, run all the commands there and then just resync all the replicas.
Depending on your specific project you may also need to make sure there are no conflicts when running commands from different remote location but there is no general technical solution to this. Again, depending on the project, you may want to cancel one of the transactions, notify the users that it happened and so on.
I would recommend taking a look at CouchDB. It's a non-SQL database that does exactly what you are describing automatically. It's used especially in phone applications that often don't have internet or data connectivity. The idea is that you have a local copy of a CouchDB database and one or more remote CouchDB databases. The CouchDB server then takes care of teh replication of the distributed systems and you always work off your local database. This approach is nice because you don't have to build your own distributed replication engine. For more details I would take a look at the 'Distributed Updates and Replication' section of their documentation.