I'm totally new to docker and started yesterday to do some tutorials. I want to build a small test application consisting of several different services (replicated and so on) that interact with each other and encountered a problem regarding 'service-discovery'. I started with the get-started tutorials on docker.com and at the moment i'm not really sure what's best practice in the world of docker to let the different containers in a network get to know each other...
As this is a rather vague 'problem description', i try to make this more precise. I want to use a few independent services (e.g. with stuff like postgre, mongodb, redis and rabbitmq...) together with a set of worker nodes to which work is assigned by a dedicated master node. Since it seems to be quite convenient, I wanted to use a docker-composer.yml file to define all my services and deploy them as a stack.
Moreover, I created a custom network and since it seems not to be possible to attach a stacked service to a bridge network, I created an attachable overlay network.
To finally get to the point: even though the services are deployed correctly, their actual container-name is random and without using somekind of service registry I'm not able to resolve their addresses.
A simple solution would be to use single containers with fixed container names - however this does not seem to be a best practice solution (even though it is actually just a docker-based DNS that is based on container names rather than domain names). Another problem are the randomly generated container names that contain underscores, and hence these names are not valid addresses that can be resolved...
best regards
Have you looked at something like Kubernetes? To quote from the home page:
It groups containers that make up an application into logical units for easy management and discovery.
Related
tldr; does docker swarm have a forceful and centered proxy setting that explicitly proxies all internet traffic in all services that is hosted in the cluster? Or any other tip of how to go about using a global proxy solution in a swarm cluster...?
Obs! this is not a question about a reversed proxy.
I have a docker swarm cluster (moving to Kubernatives as a solution is off-topic)
I have 3 managers and 3 workers, I label the workers accordingly to the expected containers they can host. The cluster only deploys docker swarm services, when I write "container" in this writing I'm referring to a docker swarm service container.
One of the workers is labelless, though active, and therefore does not host any containers to any service. If I would label the worker to allow it to host any container, then I will suffer issues in different firewalls that I don't always control, because the IP simply is not allowed.
This causes the problem for me that I can't do horizontal scaling, because when I add a new worker to the cluster, I also add a new IP that the requests can originate from. To update the many firewalls that would need to be updated because of a horizontal scaling is quite large, and simply not an option.
In my attempt to solve this on my own, I did what every desperate developer does and googled for a solution... and there is a simple and official documentation to be able to achieve this: https://docs.docker.com/network/proxy/
I followed the environment variables examples on that page. Doing so did however not really help, none of the traffic goes through the proxy I configured. After some digging, I noticed that this is due to nodejs (all services are written using nodejs), ignoring the proxy settings set by the environment. To solve that nodejs can use these proxy settings, I have to refactor a lot of components in a lot of services... a workload that is quite trumendus and possibly dangerous to perform given the different protocols and ports I use to connect to different infrastructural services outside the cluster...
I expect there to be a better solution for this, I expect there to be a built in functionality that forces all internet access from the containers to go through this proxy, a setting I don't have to make in the code, in my implementations. I expect there to be a wrapping solution that I can control in a central manner.
Now reading this again, I think maybe I should have tested the docker client configuration on the same page to see if it has the desired effect I'm requiring, but I assume they both would have the same outcome, being described on the same page with no noticeable difference written in the documentation.
My question is, is there a solution, that I just don't seem to be able to find, that wraps the proxy functionality around all the services? or is it a requirement to solve these issues in the implementation itself?
My thought is to maybe depend on an image, that in its turn depends on the nodejs image that I use today - that is responsible for this wrapping functionality, though still on an implantation level. Doing so would however still force the inheriting of a distributed solution of this kind - if I need to change the proxy configurations, then I need to change them everywhere, and redeploy everything... given a less complex solution without an in common data access layer.
I've never done anything with Docker Swarm, or Kubernetes so I'm trying to learn what does what, and which is best for my purpose before tackling it.
My scenario:
I have a Desktop PC running Docker Desktop, and ..
I have a Raspberry PI running Docker on Raspbian
This is all on a home LAN, so I don't really want to get crazy with complicated things.
I want to run Pi Hole and DNSCrypt Proxy containers on both 'machines', (as redundancy, mostly because the Docker Desktop seems to crash a lot taking down my entire DNS system with it when I just use that machine for Pi-hole).
My main thing is, I want all the data/configurations, etc. between them to stay in sync (i.e. Pi hole's container data stays in sync on both devices, etc.), and I want the manager to make sure it's always up, in case of crashes, and so on.
My questions:
Being completely new to this area, and just doing a bit of poking around:
it seems that Kubernetes might be a bit much, and more complicated than I need for this?
That's why I was thinking Swarm instead, but I'm also not sure whether either of them will keep data synced?
And, say I create 2 Pi-hole containers on the Manager machine, does it create 1 on the manager machine, and 1 on the worker machine?
Any info is appreciated!
Docker doesn't quite have anything that directly meets your need, but if you've got a reliable file server on your home LAN, you could do it really easily.
Broadly speaking you want to look at Docker Volume Plugins. Most of them ultimately work via an external storage provider and so won't be that helpful for you. There's a couple of more exotic ones like Portworx or StorageOS that can do portable/replicated storage purely in Docker, but I think most of them are a paid license.
But, if you have a fileserver that you trust to stay up and running, you can mount an NFS/CIFS share as a volume as mentioned in the Docker Docs, and Docker can handle re-connecting it when a container moves from one node to another due to a failure.
One other note: you want two manager nodes and one container per service in your swarm. You need to have one working Manager node for the swarm to work (this is important if a Manager crashes). Multiple separate instances would generally only be helpful if the service was designed as a distributed/fault tolerant application.
I'm encountering a strange issue with docker-compose on one of my systems.
I have two TICK (Telegraf, InfluxDB, Chronograf, Kapacitor) docker-compose "projects" on the same machine. Using the following docker-compose.yml
Since both services are proxied behind the same NGINX SSL instance, they both join a common nginx_proxy external network.
Issue is that as soon as I start the second compose stack. Somehow the first stack starts misbehaving: a few (like 20%) requests from the first chronograf intance targeting the influxdb service via the influxdb hostname are somehow redirected to the influxb instance from the second stack.
I understand that since they are on the same nginx network, they can communicate, but how can I force the first instance to always target it's own service and not cross-compose? Tried to specify links but it did not work.
Any configuration I could setup to achieve this isolation without having to rename all my services?
I wouldn't expect the embedded DNS server to do what you want automatically as it seems ambiguous to me (the docker-compose file doesn't express what you expressed in natural language here, which is perfectly fine and non-ambiguous). In fact I'm even surprised it allows it to run (seems to be doing round-robin across the containers, which would explain why only some requests are misrouted).
To answer your question, I would simply use non-ambiguous aliases or just non-ambiguous service names, though there might be other solutions.
We are looking into using Docker plus either Mesos/Marathon or Kubernetes for hosting a cluster. However, the one issue that we haven't really seen any answers for is how to allow clustered services to connect to each other correctly. All of the ones that I have seen need to know about at least one other node before they can join the cluster. Some need to know about every node. However, in Kubernetes and Mesos, there's no way to know what those IP addresses are ahead of time.
So, are there any best practices for this? If it helps, some technologies we're looking into deploying as containers are ElasticSearch, ActiveMQ, and MongoDB. There may be others.
However, the one issue that we haven't really seen any answers for is how to allow clustered services to connect to each other correctly.
I think you're talking about HA/replicated/sharded apps here.
At the moment, in kubernetes, you can accomplish this by making an api call listing all the "endpoints" of the service; that will tell you where your peers are running.
We'd eventually like to support the use case you describe in a more first-class manner.
I filed https://github.com/GoogleCloudPlatform/kubernetes/issues/3419 to maybe get something more standardized started here.
I also wanted to setup an ElasticSearch cluster using Mesos/Marathon. As the existing "solutions" either were merely undocumented, or not working/outdated, I set up my own container.
If you like, have a look at https://github.com/tobilg/docker-elasticsearch-marathon
If you have a running Marathon installation (I use v0.8.1), then setting up an ElasticSearch cluster should be a matter of a few minutes.
UPDATE:
The container now uses Elasticsearch v1.5.2 and is able to run on the latest Marathon v0.8.2.
As for Kubernetes, it currently does require kube-controllers-manager to start with --machines argument given a list of minion IPs or hostnames.
I don't see any easy way how to handle this correctly in Kubernetes now. Yes, you could make a call to the API that returns list of endpoints but you must watch for changes and take an action when endpoints change...
I would prefer to use Mesos/Marathon that is well prepared for this scenario. You should implement custom Framework for Mesos. There is already Framework for ElasticSearch prepared: http://mesos.apache.org/documentation/latest/mesos-frameworks/
I'm having great success so far using Mesos, Marathon, and Docker to manage a fleet of servers, and the containers I'm placing on them. However, I'd now like to go a bit further and start doing things like automatically linking an haproxy container to each main docker service that starts, or provide other daemon based and containerized services that are linked and only available to the single parent container.
Normally, I'd start up the helper service first with some name, then when I started the real service, I'd link it to the helper and everything would be fine. How does this model fit in to Marathon and Mesos though? It seems for now at least that the containerization assumes a single container.
I had one idea to start the helper service first, on whatever host it could find, then add a constraint to the real service that the hostname = helper service's hostname, but that seems like it'd cause issues with resource offers and race conditions for those resources.
I've also thought to provide an "embed", or "deep-link" functionality to docker, or to the executor scripts that start the docker containers.
Before I head down any of these paths, I wanted to find out if someone else had solved this problem, or if I was just horribly over thinking things.
Thanks!
you're wandering in uncharted territory! ☺
There are multiple approaches here; and none of them is perfect, but the situation will improve in future versions of Docker, thanks to orchestration hooks.
One way is to use good old service discovery and registration. I.E., when a service starts, it will figure out its publicly available address, and register itself in e.g. Zookeeper, Etcd, or even Redis. Since it's not trivial for a service to figure out its publicly available address (unless you adopt some conventions, e.g. always mapping port X:X instead of letting Docker assing random ports), you might want to do the registration from outside. That means that your orchestration layer (Mesos in that case) would start the container, then figure out the host and port, and put that in your service discovery system. I'm not extremely familiar with Marathon, but you should be able to register a hook for that. Then, other containers will just look up the endpoint address in the service discovery registry, plain and simple.
You could also look at Skydock, which automatically registers DNS names for your containers with Skydns. However, it's currently single-host, so if you like that idea, you'll have to extend it somehow to support multiple hosts, and maybe SRV records.
Another approach is to use "well-known entry points". This is actually is simplified case of service discovery. It means that you will make sure that your services will always run on pre-set hosts and ports, so that you can use those addresses statically. Of course, this is bad (because it will make your life harder when you will want to reproduce the environment for testing/staging purposes), but if you have no clue at all about service discovery, well, it could be a start.
You could also use Pipework to create one (or multiple) virtual network spanning across multiple hosts, and binding your containers together. Pipework will let you assign IP addresses manually, or automatically through DHCP. This approach is not recommended, though, but it's a good fit if you also want to plug your containers into an existing network architecture (e.g. VLANs...).
No matter which solution you decide to use, I highly recommend to "pretend" that you're using links. I.e. instead of hard-coding your app configuration to connect to (random example) my-postgresql-db:5432, use environment variables DB_PORT_5432_TCP_ADDR and DB_PORT_5432_TCP_PORT (as if it were a link), and set those variables when starting the container. That way, if you "fold down" your containers into a simpler environment without service discovery etc., you can easily fallback on links without efforts.