Should instances of a horizontally scaled microservice share DB? - docker

Given a microservice that owns a relational database and needs to scale horizontally, I see two approaches to provisioning of the database server:
provide each instance of the service with it's own DB server instance with a coupled process lifecycle
OR
have the instances connect to a shared (by identical instances of the same service) independent db server or cluster
With an event driven architecture and the former approach, each instance of the microservice would need to process each event and take the appropriate action to mutate its own isolated state. This seems inefficient.
Taking the latter approach, only one instance has to process the event to achieve the same effect but as a mutation of the shared state. One must ensure each event is processed by only one instance of the given microservice (is this trivial?) to avoid conflict.
Is there consensus on preferred approach here? What lessons has your experience taught you on this?

I would go with the first approach, a service local DB. Each instance has its own DB instance. This enables to change the persistence layer between versions of the service.
Changing the ER model otherwise would lead to conflicts. You would also be able to change to a NoSQL solution with this approach easily.
With the event driven design, I can recommend this book: Designing Event Driven Systems
As I see it, a service receives an request that leads to an Event. This Event is consumed by the other instances of the service, therefore the request doesn't need to be processed again, but the result has to be copied to the instances state.

Related

Why are read-only nodes called read-only in the case of data store replication?

I was going through the article, https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs which says, "If separate read and write databases are used, they must be kept in sync". One obvious benefit I can understand from having separate read replicas is that they can be scaled horizontally. However, I have some doubts:
It says, "Updating the database and publishing the event must occur in a single transaction". My understanding is that there is no guarantee that the updated data will be available immediately on the read-only nodes because it depends on when the event will be consumed by the read-only nodes. Did I get it correctly?
Data must be first written to read-only nodes before it can be read i.e. write operations are also performed on the read-only nodes. Why are they called read-only nodes? Is it because the write operations are performed on these nodes not directly by the data producer application; but rather by some serverless function (e.g. AWS Lambda or Azure Function) that picks up the event from the topic (e.g. Kafka topic) to which the write-only node has sent the event?
Is the data sharded across the read-only nodes or does every read-only node have the complete set of data?
All of these have "it depends"-like answers...
Yes, usually, although some implementations might choose to (try to) update read models transactionally with the update. With multiple nodes you're quickly forced to learn the CAP theorem, though, and so in many CQRS contexts, eventual consistency is just accepted as a feature, as the gains from tolerating it usually significantly outweigh the losses.
I suspect the bit you quoted anyway refers to transactionally updating the write store with publishing the event. Even this can be difficult to achieve, and is one of the problems event sourcing seeks to solve.
Yes. It's trivially obvious - in this context - that data must be written before it can be read, but your apps as consumers of the data see them as read-only.
Both are valid outcomes. Usually this part is less an application concern and is more delegated to the capabilities of your chosen read-model infrastructure (Mongo, Cosmos, Dynamo, etc).

Calling specific instances of a docker service

Not exactly sure how to ask this question or if this is a valid approach. So I am learning all about docker, containers, etc. From what I have read it is great for creating individual different microservices that perform various tasks such as BasketService, CartService, etc, which can each be contained in their own docker container on a vm which I think the URL calls from my UI (If hosted on a linux vm) would be something along the lines of https://MyLinuxVM/BasketService/{controller}.
My Question:
Now lets say I have only 1 service. We will call it MyService, that needs to have multiple instances. So I could have 4 instances i.e: MyService1, MyService2, MyService3, MyService4. All exactly the same. From my client, would the following assumption be correct?
I can call https://MyLinuxVM/MyService1/{controller} or https://MyLinuxVM/MyService2/{controller} to send to a specific container instance?
Why:
I feel this may help explain why I am doing this and possibly help everyone understand my problem in the first place. I have 4 physical devices I need to communicate with. We will call them Device1, Device2, Device3, Device4. Each device has its own IP Address, and its own set of "Tools" connected to it on various ports of the device (10-20 ports per device).
From our UI, the users can click a button that sets some torque values for the tool in their hand by sending the data to the MVC backend which gets sent to the "Correct" background worker/container which will then transform the data into byte[] and pass it along to its dedicated device. I am not sure if I need multiple background workers in a single container, or just a single configurable container with a single background worker that gets deployed multiple times dependent on number of devices we have running in the shop.
I have read a lot of things on creating different worker services that do different tasks, but I need multiple instances of a worker service that can be configured (preferably from db tables) to send to a specific device.
Picture for additional details / visual:

Which part of Orleans is actually distributed?

There is a couple of confusing points in the documentation that make me struggle to understand how exactly distribution across the cluster happens in Orleans. Hence, the questions.
Question #1
Orleans claims to have a built-in distribution capabilities to distribute across multiple servers. To me it sounds that Orleans can act as a load balancer itself and can scale out automatically. Thus, if I deploy Orleans app to several servers, then service discovery and load management should happen automatically, correct?
In this case, why some docs and articles suggest using other tools, like Ocelot or Consul, as a single entry point to Orleans cluster?
Question #2
I would like to use simple but distributed in-memory storage across several servers, like Redis or Apache Ignite, and I would like to know if it's possible to use a simple grain as this kind of a data storage?
Let's say, one grain will store a collection of restaurants and some other grain will keep track of the last 1000 visitors for selected restaurant. Can I activate these 2 grains only once as a singleton collection, add or remove records to each collection, and use these 2 grains as in-memory storage evenly available to all nodes in the cluster? Also, if answer is yes, do I need to add locks to these collections or each grain always exists in a single thread?
Service discovery and load management happen automatically indeed.
Consul is not a strong required. The only external requirement is a Membership table provider - something that is used internally by Orleans Clustering. There are many build in Membership table providers that come already built-in with Orleans. For example, Azure table storage. all you need is to configure Orleans to use it and of course have Azure storage account. Consul is another alternative to Membership table provider and there are more.
Another thing that does not come built-in is infrastructure scaling. If your service demand increases, something need to ask the infrastructure provider (Cloud Provider) to add more Servers. Once servers are added, Orleans will automatically adjust the workload and load balance across the new servers as well. But figuring out that more servers are needed and adding them is not done by Orleans itself (there likely some externally contributed tools to do that. maybe K8 can be configured to do that? I am not completely sure about that).
Yes, you can use those 2 grains as in-memory storage, just like you wrote. And no, you do not need to use locks. All grains are single threaded.

How to make ACE/TAO service setup more user-friendly?

The standard way of setting up a network of applications communicating over the ACE/TAO CORBA framework has always been
run the naming service
run the event channel
run your applications
I'd like to alleviate my end-users from having to spawn multiple background services by hand and am looking for a clean solution. I'd also like to have my networks as plug 'n play as possible. That means we're synchronizing various hardware components with the help of a central controller instance. Each of these pairings makes up an (isolated) network, so we can have multiples of these in one environment and don't want any interference between them.
My idea was to just spawn a naming service and and event service on the controller's initialization but I haven't found a nice way yet to spawn both processes (tao_cosnaming, tao_rtevent) as child processes, so that they are really tied to the controller instance and don't keep running if the controller crashes i.e. Is there already a mechanism inside TAO that allows this?
The Implementation Repository could do this for you. Another option is to just link the Naming Service and Event Channel into your controller, just one process that also delivers these services.

Passing messages between remote MailboxProcessors?

I'm using MailboxProcessor classes in order to keep separate agents that do their own thing. Normally agents can communicate with one another in the same process, but I want agents to talk to one another when they are on separate processes or even different machines. What kind of mechanism is best for implementing communication between them? Is there some standard solution?
Please note that I'm using Ubuntu instances to run the agents.
I think you're going to have write your own routines to serialize messages, pass them accross the process boundaries and then dispatch them on the other side. This will also require a implementation of a ID system where each mailbox has an ID and processes can send messages to IDs instead of just Mailbox.Send. This is not easy, as local boxes will be able to access local memory, but remote mailboxes will not.
I would look at something like RPyC (http://rpyc.wikidot.com/) as it provides a protocol somewhat like you are looking for.
Basically the answer is 'no' there isn't really a good way to do this.

Resources