Linked Docker Containers with Mesos/Marathon - docker

I'm having great success so far using Mesos, Marathon, and Docker to manage a fleet of servers, and the containers I'm placing on them. However, I'd now like to go a bit further and start doing things like automatically linking an haproxy container to each main docker service that starts, or provide other daemon based and containerized services that are linked and only available to the single parent container.
Normally, I'd start up the helper service first with some name, then when I started the real service, I'd link it to the helper and everything would be fine. How does this model fit in to Marathon and Mesos though? It seems for now at least that the containerization assumes a single container.
I had one idea to start the helper service first, on whatever host it could find, then add a constraint to the real service that the hostname = helper service's hostname, but that seems like it'd cause issues with resource offers and race conditions for those resources.
I've also thought to provide an "embed", or "deep-link" functionality to docker, or to the executor scripts that start the docker containers.
Before I head down any of these paths, I wanted to find out if someone else had solved this problem, or if I was just horribly over thinking things.
Thanks!

you're wandering in uncharted territory! ☺
There are multiple approaches here; and none of them is perfect, but the situation will improve in future versions of Docker, thanks to orchestration hooks.
One way is to use good old service discovery and registration. I.E., when a service starts, it will figure out its publicly available address, and register itself in e.g. Zookeeper, Etcd, or even Redis. Since it's not trivial for a service to figure out its publicly available address (unless you adopt some conventions, e.g. always mapping port X:X instead of letting Docker assing random ports), you might want to do the registration from outside. That means that your orchestration layer (Mesos in that case) would start the container, then figure out the host and port, and put that in your service discovery system. I'm not extremely familiar with Marathon, but you should be able to register a hook for that. Then, other containers will just look up the endpoint address in the service discovery registry, plain and simple.
You could also look at Skydock, which automatically registers DNS names for your containers with Skydns. However, it's currently single-host, so if you like that idea, you'll have to extend it somehow to support multiple hosts, and maybe SRV records.
Another approach is to use "well-known entry points". This is actually is simplified case of service discovery. It means that you will make sure that your services will always run on pre-set hosts and ports, so that you can use those addresses statically. Of course, this is bad (because it will make your life harder when you will want to reproduce the environment for testing/staging purposes), but if you have no clue at all about service discovery, well, it could be a start.
You could also use Pipework to create one (or multiple) virtual network spanning across multiple hosts, and binding your containers together. Pipework will let you assign IP addresses manually, or automatically through DHCP. This approach is not recommended, though, but it's a good fit if you also want to plug your containers into an existing network architecture (e.g. VLANs...).
No matter which solution you decide to use, I highly recommend to "pretend" that you're using links. I.e. instead of hard-coding your app configuration to connect to (random example) my-postgresql-db:5432, use environment variables DB_PORT_5432_TCP_ADDR and DB_PORT_5432_TCP_PORT (as if it were a link), and set those variables when starting the container. That way, if you "fold down" your containers into a simpler environment without service discovery etc., you can easily fallback on links without efforts.

Related

How to route all internet requests through a proxy in docker swarm

tldr; does docker swarm have a forceful and centered proxy setting that explicitly proxies all internet traffic in all services that is hosted in the cluster? Or any other tip of how to go about using a global proxy solution in a swarm cluster...?
Obs! this is not a question about a reversed proxy.
I have a docker swarm cluster (moving to Kubernatives as a solution is off-topic)
I have 3 managers and 3 workers, I label the workers accordingly to the expected containers they can host. The cluster only deploys docker swarm services, when I write "container" in this writing I'm referring to a docker swarm service container.
One of the workers is labelless, though active, and therefore does not host any containers to any service. If I would label the worker to allow it to host any container, then I will suffer issues in different firewalls that I don't always control, because the IP simply is not allowed.
This causes the problem for me that I can't do horizontal scaling, because when I add a new worker to the cluster, I also add a new IP that the requests can originate from. To update the many firewalls that would need to be updated because of a horizontal scaling is quite large, and simply not an option.
In my attempt to solve this on my own, I did what every desperate developer does and googled for a solution... and there is a simple and official documentation to be able to achieve this: https://docs.docker.com/network/proxy/
I followed the environment variables examples on that page. Doing so did however not really help, none of the traffic goes through the proxy I configured. After some digging, I noticed that this is due to nodejs (all services are written using nodejs), ignoring the proxy settings set by the environment. To solve that nodejs can use these proxy settings, I have to refactor a lot of components in a lot of services... a workload that is quite trumendus and possibly dangerous to perform given the different protocols and ports I use to connect to different infrastructural services outside the cluster...
I expect there to be a better solution for this, I expect there to be a built in functionality that forces all internet access from the containers to go through this proxy, a setting I don't have to make in the code, in my implementations. I expect there to be a wrapping solution that I can control in a central manner.
Now reading this again, I think maybe I should have tested the docker client configuration on the same page to see if it has the desired effect I'm requiring, but I assume they both would have the same outcome, being described on the same page with no noticeable difference written in the documentation.
My question is, is there a solution, that I just don't seem to be able to find, that wraps the proxy functionality around all the services? or is it a requirement to solve these issues in the implementation itself?
My thought is to maybe depend on an image, that in its turn depends on the nodejs image that I use today - that is responsible for this wrapping functionality, though still on an implantation level. Doing so would however still force the inheriting of a distributed solution of this kind - if I need to change the proxy configurations, then I need to change them everywhere, and redeploy everything... given a less complex solution without an in common data access layer.

Service mesh with consul and docker swarm EE

I'm new in service mesh with Consul.
I found a lot of documentation about using Consul and Envoy for service mesh in K8S but I'm not finding much documentation about using it on docker swarm (Enterprise Edition).
My question is: is it possible to implement it on Docker Swarm EE? If not, what are the technical reasons that prevent or not recommend to implement it?
I wondered the same.
The main problem with docker swarm it seems is it lacks the concept of "sidecar" containers. For example, k8's has "pods". I haven't used k8's, but my understanding is that, you can group services into a unit called a "pod". This has benefits and really enables the mesh style architecture.. one reason is that services in the same "pod" can all communicate through "localhost" on different port bindings - i.e the services are "local" to eachother. When you want a "companion" service this is what you need as you know communicating with it is going to be fast as it is essentially local / co located with your app. Now consider swarm. You can add services to your stack, but you don't necessarily know where they are going to be placed - your "side car proxy" servcice could end up being placed on node 2 whilst your app is on node 1. This is not very efficient as it means there are now network hops to route traffic between your app and its "sidecar" proxy which could be on the other side of the data centre, but should really be local. So you start thinking of creative workarounds.. What about if I use "placement" settings to place my service and the sidecar service on the same node? Well then you lose the ability for swarm to place them on a different node if that node goes down, because your placement options have confined it to only one node. What if.. you deploy the "sidecar" proxy as a "global" service so that it is available on each node? Then your apps should all be able to communicate with the service via the IP address of whatever node its on.. but how do you configure that IP address per task (container)? I'm exploring that option, but then that gives you a single sidecar instance per node (1 instance to potentially serve many services) so this has impacts for how you scale that sidecar. I think possibly one other solution is that you have to embed these "sidecar" services into your own service docker image so that they are truly running locally with your app. However I haven't seen any that really advocate that approach so it's most likely fraught with hurdles to overcome. Most documentation is for k8s,, and nothing for swarm for these sorts of reasons. If only swarm could have added this ability in it's style of simplicity it would extend its reach so much.

Does Docker Swarm keep data synced among nodes?

I've never done anything with Docker Swarm, or Kubernetes so I'm trying to learn what does what, and which is best for my purpose before tackling it.
My scenario:
I have a Desktop PC running Docker Desktop, and ..
I have a Raspberry PI running Docker on Raspbian
This is all on a home LAN, so I don't really want to get crazy with complicated things.
I want to run Pi Hole and DNSCrypt Proxy containers on both 'machines', (as redundancy, mostly because the Docker Desktop seems to crash a lot taking down my entire DNS system with it when I just use that machine for Pi-hole).
My main thing is, I want all the data/configurations, etc. between them to stay in sync (i.e. Pi hole's container data stays in sync on both devices, etc.), and I want the manager to make sure it's always up, in case of crashes, and so on.
My questions:
Being completely new to this area, and just doing a bit of poking around:
it seems that Kubernetes might be a bit much, and more complicated than I need for this?
That's why I was thinking Swarm instead, but I'm also not sure whether either of them will keep data synced?
And, say I create 2 Pi-hole containers on the Manager machine, does it create 1 on the manager machine, and 1 on the worker machine?
Any info is appreciated!
Docker doesn't quite have anything that directly meets your need, but if you've got a reliable file server on your home LAN, you could do it really easily.
Broadly speaking you want to look at Docker Volume Plugins. Most of them ultimately work via an external storage provider and so won't be that helpful for you. There's a couple of more exotic ones like Portworx or StorageOS that can do portable/replicated storage purely in Docker, but I think most of them are a paid license.
But, if you have a fileserver that you trust to stay up and running, you can mount an NFS/CIFS share as a volume as mentioned in the Docker Docs, and Docker can handle re-connecting it when a container moves from one node to another due to a failure.
One other note: you want two manager nodes and one container per service in your swarm. You need to have one working Manager node for the swarm to work (this is important if a Manager crashes). Multiple separate instances would generally only be helpful if the service was designed as a distributed/fault tolerant application.

Why don't use host network in docker since docker and kubernetes network is so complex

Using docker can simplify CI/CD but also introduce the complexity, not everybody able to hold the docker network though selecting open source solutions like Flannel, Calico.
So why don't use host network in docker, or what lost if use host network in docker.
I know the port conflict is one point, any others?
There are two parts to an answer to your question:
Pods must have individual, cluster-routable, IP addresses and one should be very cautious about recycling them
You can, if you wish, not use any software defined network (SDN)
So with the first part, it is usually a huge hassle to provision a big enough CIDR to house the address range required for supporting every Pod that is running across every Namespace, and have the space be big enough to avoid recycling addresses for a very long time. Thus, having an SDN allows using "fake" addresses that one need not bother the "real" network with knowing about. No routers need to be updated, no firewalls, no DHCP, whatever.
That said, as with the second part, you don't have to use an SDN: that's exactly what the container network interface (CNI) is designed to paper over. You can use the CNI provider that makes you the happiest, including using static IP addresses or the outer network's DHCP server.
But your comment about port collisions is pretty high up the list of reasons one wouldn't just want to hostNetwork: true and be done with it; I'm actually not certain if the default kubernetes scheduler is aware of hostNetwork: true and the declared ports: on the containers: in order to avoid co-scheduling two containers that would conflict. I guess try it and see, or, better yet, don't try it -- use CNI so the next poor person who tries to interact with your cluster doesn't find a snowflake setup.

docker stack with overlay network & name resolution

I'm totally new to docker and started yesterday to do some tutorials. I want to build a small test application consisting of several different services (replicated and so on) that interact with each other and encountered a problem regarding 'service-discovery'. I started with the get-started tutorials on docker.com and at the moment i'm not really sure what's best practice in the world of docker to let the different containers in a network get to know each other...
As this is a rather vague 'problem description', i try to make this more precise. I want to use a few independent services (e.g. with stuff like postgre, mongodb, redis and rabbitmq...) together with a set of worker nodes to which work is assigned by a dedicated master node. Since it seems to be quite convenient, I wanted to use a docker-composer.yml file to define all my services and deploy them as a stack.
Moreover, I created a custom network and since it seems not to be possible to attach a stacked service to a bridge network, I created an attachable overlay network.
To finally get to the point: even though the services are deployed correctly, their actual container-name is random and without using somekind of service registry I'm not able to resolve their addresses.
A simple solution would be to use single containers with fixed container names - however this does not seem to be a best practice solution (even though it is actually just a docker-based DNS that is based on container names rather than domain names). Another problem are the randomly generated container names that contain underscores, and hence these names are not valid addresses that can be resolved...
best regards
Have you looked at something like Kubernetes? To quote from the home page:
It groups containers that make up an application into logical units for easy management and discovery.

Resources