microservices & service discovery with random ports - docker

My question is related to microservices & service discovery of a service which is spread between several hosts.
The setup is as follows:
2 docker hosts (host A & host B)
a Consul server (service discovery)
Let’s say that I have 2 services:
service A
service B
Service B is deployed 10 times (with random ports): 5 times on host A and 5 times on host B.
When service A communicates with service B, for example, it sends a request to serviceB.example.com (hard coded).
In order to get an IP and a port, service A should query the Consul server for an SRV record.
It will get 10 ip:port pairs, for which the client should apply some load-balancing logic.
Is there a simpler way to handle this without me developing a client resolver (+LB) library for that matter ?
Is there anything like that already implemented somewhere ?
Am I doing it all wrong ?

There are a few options:
Load balance on client as you suggest for which you'll either need to find a ready-build service discovery library that works with SRV records and handles load balancing and circuit breaking. Another answer suggested Netflix' ribbon which I have not used and will only be interesting if you are on JVM. Note that if you are building your own, you might find it simpler to just use Consul's HTTP API for discovering services than DNS SRV records. That way you can "watch" for changes too rather than caching the list and letting it get stale.
If you don't want to reinvent that particular wheel, another popular and simple option is to use a HAProxy instance as the load balancer. You can integrate it with consul via consul-template which will automatically watch for new/failed instances of your services and update LB config. HAProxy then provides robust load balancing and health checking with a lot of options (http/tcp, different balancing algorithms, etc). One possible setup is to have a local HAProxy instance on each docker host and a fixed port assigned statically to each logical service (can store it in Consul KV) so you connect to localhost:1234 for service A for example and localhost:2345 for service B. Local instance means you don't pay for extra round trip to loadbalancer instance then to the actual service instance but this might not be an issue for you.

I suggest you to check out Kontena. It will solve this kind of problem out of the box. Every service will have an internal DNS that you can use in communication between services. Kontena has also built-in load balancer that is very easy to use making it very easy to create and scale micro services.
There are also lot's of built-in features that will help developing containerized applications, like private image registry, VPN access to running services, secrets management, stateful services etc.
Kontena is open source project and the code is visible on Github

If you look for a minimal setup, you can wrap the values you receive from Consul via ribbon, Netflix' client based load balancer.
You will find it as a module for Spring Cloud.
I didn't find an up-to-date standalone example, only this link to chrisgray's dropwizard-consul implementation that is using it in a Dropwizard context. But it might serve as a starting point for you.

Related

Service mesh with consul and docker swarm EE

I'm new in service mesh with Consul.
I found a lot of documentation about using Consul and Envoy for service mesh in K8S but I'm not finding much documentation about using it on docker swarm (Enterprise Edition).
My question is: is it possible to implement it on Docker Swarm EE? If not, what are the technical reasons that prevent or not recommend to implement it?
I wondered the same.
The main problem with docker swarm it seems is it lacks the concept of "sidecar" containers. For example, k8's has "pods". I haven't used k8's, but my understanding is that, you can group services into a unit called a "pod". This has benefits and really enables the mesh style architecture.. one reason is that services in the same "pod" can all communicate through "localhost" on different port bindings - i.e the services are "local" to eachother. When you want a "companion" service this is what you need as you know communicating with it is going to be fast as it is essentially local / co located with your app. Now consider swarm. You can add services to your stack, but you don't necessarily know where they are going to be placed - your "side car proxy" servcice could end up being placed on node 2 whilst your app is on node 1. This is not very efficient as it means there are now network hops to route traffic between your app and its "sidecar" proxy which could be on the other side of the data centre, but should really be local. So you start thinking of creative workarounds.. What about if I use "placement" settings to place my service and the sidecar service on the same node? Well then you lose the ability for swarm to place them on a different node if that node goes down, because your placement options have confined it to only one node. What if.. you deploy the "sidecar" proxy as a "global" service so that it is available on each node? Then your apps should all be able to communicate with the service via the IP address of whatever node its on.. but how do you configure that IP address per task (container)? I'm exploring that option, but then that gives you a single sidecar instance per node (1 instance to potentially serve many services) so this has impacts for how you scale that sidecar. I think possibly one other solution is that you have to embed these "sidecar" services into your own service docker image so that they are truly running locally with your app. However I haven't seen any that really advocate that approach so it's most likely fraught with hurdles to overcome. Most documentation is for k8s,, and nothing for swarm for these sorts of reasons. If only swarm could have added this ability in it's style of simplicity it would extend its reach so much.

Google Cloud Run Container Networking

I have a system of apps/services in docker containers that, when I bring them up using docker-compose, talk to each other using a bridge network.
Workers start up and register themselves with a manager. The manager assigns the workers work to do. In order to do this, the workers need to know where the manager is, and the manager needs to know where the workers are.
I want to deploy them all to Google Cloud Run.
At the moment, in docker via docker-compose, they talk to each other using their container names. For example the worker might call: http://manager:5000/register?name=worker1&port=5000 to register on startup, and then the manager can call http://worker1:5000 to send work. All thanks to the fact that they're connected to the same bridge network.
How does this work with Google Cloud Run? As far as I can see, when you create a service linked with a container, you get a permanent URL to communicate with your app once it has started. The app in the container doesn't know what the URL is.
Can I use the service names to communicate with each other in the same way as a docker bridge network?
Cloud Run currently does not support hostname based service discovery for other services in your project.
Currently, your best bet is to configure service URLs that your app depends on using environment variables or something like that.
In fact, you can't orchestrate in the same way the workers. Indeed, the Cloud Run services reply to an HTTP request. When an instance is spawn, there is no registration to a manager.
If you want to perform several task in parallel, perform several HTTP requests.
If you want a strong isolation between the different instances of a same service, set the concurrency param to 1 (only 1 HTTP request is processed in the same time by an instance of the service).
For information, you can have up to 100 instances for a same service.
So, deploy a manager service, and a worker service. The manager service perform HTTP request to worker with the right param for doing the right job.
Take care of the job duration. For now, the timeout can be set up to 900 seconds (15min) maximum
About the naming, the pattern is the following: https://<service-name>-<project-hash>.run.app/

On-prem docker swarm deployment with HA

I’m doing on-prem deployments using docker swarm and I need application and DB high availability.
As far as application HA is concerned, it works great within docker (service discovery and load balancing), but I’m not sure how to use it on my network. I mean how can I assign a virtual IP to all of my docker managers so that if any of them goes down, that virtual IP automatically points to the other docker manager in the cluster. I don’t want to have a single point of failure in my architecture, that’s why I’m not inclined to use any (single) reverse proxy solution in front of my swarm cluster (because to my understanding, if nginx/HAProxy goes down, the whole system goes into abyss. I would love to know that I’m wrong).
Secondly, I use WebSockets in my application for push notifications which doesn’t behave normally with all the load balancing stuff because socket handshakes get distorted.
I want a solution to these problems without writing anything in code (HA-specific and non-generic like hard coding IPs etc). Any suggestions? I hope I explained my problem correctly.
Docker Flow Proxy or Traefik can be placed on a set of swarm nodes that you want to receive traffic for incoming connections, and use DNS routing to get packets to the correct containers. Both have sticky sessions option (I know Docker Flow does, not sure about Traefik).
Then you can either:
If your incoming connections are just client HTTP/S requests, you can use DNS Round Robin with multiple A records, which works great, or
By an expensive hardware fault tolerant reverse proxy like F5
Use some network-layer IP failover that is at the OS and physical network level (not related to Docker really), but I'm not sure how well that would work with Swarm.
Number 2 is the typical solution in private datacenters that need full HA at all layers.

Is it possible to do webserver affinity in a Docker Swarm?

I have a Docker container that is a REST API webserver. I want to use this webserver in a Docker Swarm. A couple of the REST API calls are used in an asynchronous pattern. That is, the first call provides data for processing, and is returned a request identifier. The second call uses the request identifier to check on the processing and get the results when processing is done. Since there is no connection between any of the webservers in the Docker Swarm, how can I force the second REST API call back to the Docker instance that was used in the first REST API call? Is there anyway to ensure webserver affinity for these two REST API calls in a Docker Swarm?
With Docker Swarm Mode and Ingress networking, connections are processed with round robin load balancing, and this isn't configurable. If the connection remains open, which is the case for most web browsers, you'll find that requests go back to the same instance.
You can use a reverse proxy in front of your application that is aware of each instance of the service. Docker has this with their HRM tool in the EE offering, and many of the other reverse proxies, like traefik, offer similar sticky session options.
If you can, a better design would be to utilize an external cache for any persistence, e.g. redis. This way you can perform a rolling update of your application without breaking all the sessions.

How to compares Istio to Docker Swarm?

Reading the documentation about Istio I come with this questions.
Which are the points where Istio And Docker Swarm works the same?
Also, which one is better in different scenarios?
It's true that descriptions of Istio and Docker Swarm both refer the term "service mesh".
However, service mesh in Docker Swarm is more comparable to Services model seen in Kubernetes, and the two orchestrators are generally comparable with respect to most features each of them has. In both of the orchestrators' service routing only touches network layers, and doesn't have a visibility into e.g. HTTP protocol.
Please note that Kubernetes Ingress API should be considered separately, it actually sits above the service model, and is in fact implemented by an external controller, e.g. Traefik or HAProxy, and actually Istio brings another implementation of ingress controller.
While Istio is (approx) one level above an orchestrator, right now it runs only on Kubernetes, but it will very likely support Docker Swarm along with other popular orchestrators in the future.
More specifically, Istio's service mesh is much more advanced than what Docker Swarm offers (and, by analogy, what Kubernetes Services offer), e.g. it enables fault injection, and transparent TLS, among many other features.
Docker Engine swarm mode makes it easy to publish ports for services to make them available to resources outside the swarm. All nodes participate in an ingress routing mesh. The routing mesh enables each node in the swarm to accept connections on published ports for any service running in the swarm, even if there’s no task running on the node. The routing mesh routes all incoming requests to published ports on available nodes to an active container.
Istio is an open platform to connect, manage, and secure services. Essentially, its an open service mesh, where we would like developers and operators to not have headaches about how to connect services, how to think about making them resilient, how to secure them, and how to manage the runtime. We would like Istio to be able to do that for developers and operators across all environments and clouds. And, when I say services its really all kinds of services not necessarily just micro. It could be anything like you are building MySQL API service, a really small micro-service within your application for payment or shopping, in any given language. So, istio takes an approach of working with a polyglot environment. You know, no matter which language you write your services in and where you have deployed, Istio would like to give you a uniform substrate between your application and network, which can take care of connectivity between services, resiliency between services. So resiliency includes things like retries, failover, all of the good stuff , and distributed systems securing services. We think internal services should be as secure as external once, so security by default. And, have complete observability and visibility of metrics all the way from L3 to L7 across all your services.
Essentially think about layer (some people called it L5) which is basically a layer between your application and network. And, when you think about it, you are basically creating, we are injecting the a proxy next to every service. And, those are in the data path of all service-to-service communication. They are all interconnected and also connected to a common control plane. And, that interconnected set of proxies which are living next to every service is what typically is being called as Service Mesh The reason it is so interesting is that once you think of mesh as a layer which exist as a network does, you can, at the application layer offload things like connectivity, resiliency, visibility to that layer. So, historically you could do this in either application libraries as if you are building in java, python or go there's bunch of libraries in each of these languages that you could import and write a logic into it. Or you could do L3 layer security and policy like IP white-listing, firewall rule setup, VPN networking, VPN peering, and so on. So, we think service mesh is a space between the two that can offload few things from L7 and give policy-driven contracts to operate your network. So, Istio service mesh is much better than the Docker Swarm service mesh.
It's an apples to oranges comparison in many respects. Istio (currently) runs on top of Kubernetes, a container orchestrator like Docker Swarm.

Resources