Docker and service clusters - docker

We are looking into using Docker plus either Mesos/Marathon or Kubernetes for hosting a cluster. However, the one issue that we haven't really seen any answers for is how to allow clustered services to connect to each other correctly. All of the ones that I have seen need to know about at least one other node before they can join the cluster. Some need to know about every node. However, in Kubernetes and Mesos, there's no way to know what those IP addresses are ahead of time.
So, are there any best practices for this? If it helps, some technologies we're looking into deploying as containers are ElasticSearch, ActiveMQ, and MongoDB. There may be others.

However, the one issue that we haven't really seen any answers for is how to allow clustered services to connect to each other correctly.
I think you're talking about HA/replicated/sharded apps here.
At the moment, in kubernetes, you can accomplish this by making an api call listing all the "endpoints" of the service; that will tell you where your peers are running.
We'd eventually like to support the use case you describe in a more first-class manner.
I filed https://github.com/GoogleCloudPlatform/kubernetes/issues/3419 to maybe get something more standardized started here.

I also wanted to setup an ElasticSearch cluster using Mesos/Marathon. As the existing "solutions" either were merely undocumented, or not working/outdated, I set up my own container.
If you like, have a look at https://github.com/tobilg/docker-elasticsearch-marathon
If you have a running Marathon installation (I use v0.8.1), then setting up an ElasticSearch cluster should be a matter of a few minutes.
UPDATE:
The container now uses Elasticsearch v1.5.2 and is able to run on the latest Marathon v0.8.2.

As for Kubernetes, it currently does require kube-controllers-manager to start with --machines argument given a list of minion IPs or hostnames.

I don't see any easy way how to handle this correctly in Kubernetes now. Yes, you could make a call to the API that returns list of endpoints but you must watch for changes and take an action when endpoints change...
I would prefer to use Mesos/Marathon that is well prepared for this scenario. You should implement custom Framework for Mesos. There is already Framework for ElasticSearch prepared: http://mesos.apache.org/documentation/latest/mesos-frameworks/

Related

How to route all internet requests through a proxy in docker swarm

tldr; does docker swarm have a forceful and centered proxy setting that explicitly proxies all internet traffic in all services that is hosted in the cluster? Or any other tip of how to go about using a global proxy solution in a swarm cluster...?
Obs! this is not a question about a reversed proxy.
I have a docker swarm cluster (moving to Kubernatives as a solution is off-topic)
I have 3 managers and 3 workers, I label the workers accordingly to the expected containers they can host. The cluster only deploys docker swarm services, when I write "container" in this writing I'm referring to a docker swarm service container.
One of the workers is labelless, though active, and therefore does not host any containers to any service. If I would label the worker to allow it to host any container, then I will suffer issues in different firewalls that I don't always control, because the IP simply is not allowed.
This causes the problem for me that I can't do horizontal scaling, because when I add a new worker to the cluster, I also add a new IP that the requests can originate from. To update the many firewalls that would need to be updated because of a horizontal scaling is quite large, and simply not an option.
In my attempt to solve this on my own, I did what every desperate developer does and googled for a solution... and there is a simple and official documentation to be able to achieve this: https://docs.docker.com/network/proxy/
I followed the environment variables examples on that page. Doing so did however not really help, none of the traffic goes through the proxy I configured. After some digging, I noticed that this is due to nodejs (all services are written using nodejs), ignoring the proxy settings set by the environment. To solve that nodejs can use these proxy settings, I have to refactor a lot of components in a lot of services... a workload that is quite trumendus and possibly dangerous to perform given the different protocols and ports I use to connect to different infrastructural services outside the cluster...
I expect there to be a better solution for this, I expect there to be a built in functionality that forces all internet access from the containers to go through this proxy, a setting I don't have to make in the code, in my implementations. I expect there to be a wrapping solution that I can control in a central manner.
Now reading this again, I think maybe I should have tested the docker client configuration on the same page to see if it has the desired effect I'm requiring, but I assume they both would have the same outcome, being described on the same page with no noticeable difference written in the documentation.
My question is, is there a solution, that I just don't seem to be able to find, that wraps the proxy functionality around all the services? or is it a requirement to solve these issues in the implementation itself?
My thought is to maybe depend on an image, that in its turn depends on the nodejs image that I use today - that is responsible for this wrapping functionality, though still on an implantation level. Doing so would however still force the inheriting of a distributed solution of this kind - if I need to change the proxy configurations, then I need to change them everywhere, and redeploy everything... given a less complex solution without an in common data access layer.

How to deploy a kubernetes cluster on multiple physical machines in the best manner?

I recently finished a project where I created an App consisting of several docker containers. The purpose of the app was to collect some data and safe it to an databank and also allow user interactions over an simple web gui. The app was hosted on four different Raspberry Pi's and it was possible to collect data from all physicial maschines through an api. Further you could do some simple machine learning tasks like calculating anomalies in the sensor data of the Pi's.
Now I'm trying to take the next step and using kubernetes for some load balancing and remote updates. My main goal is to remote update all raspberries from my master node. Which, in theory, would be a very handy feature. Also I want to share the ressources of the Pi's within the cluster for calculations.
I read a lot about Kubernets, Minikube, K3's, Kind and all the different approaches to set up an Kubernetes cluster, but feel like I am missing "a last puzzle piece".
So from what I understood I need an approach which allows me to set up an local (because all machines are laying on my desk/ no cloud needed) multi node cluster. My master node would be (idealy) my laptop, running Ubuntu in a virtual machine. My rasberry's would be my slave/worker nodes. If I would want to update my cluster I can use the kubernetes remote update functionality.
So my question out of this would be: Does it makes sense to use several rasberries as nodes in a kubernetes cluster and to manage them from one master node (laptop) and do you have any suggestions about the way to achieve this setup.
I usally dont like those question not containing any specific code or questions by myself, but feel like an simple hint could accelerate my project noteable. If it's the wrong place please feel free to delete this question.
Best regards
You didn't mention which rpi models you are using, but I assume you are not using rpi zeros.
My main goal is to remote update all raspberries from my master node.
Assuming that by that you mean updating your applications running in kubernetes that is installed on rpi then keep reading. Otherwise ignore all I wrote, and what you probably need is ansible or other simmilar provisioning/configuration-management/application-deployment tool.
Now answering to your question:
Does it makes sense to use several rasberries as nodes in a kubernetes cluster
yes, this is why people created k3s, so such setup is possible using less resources.
and to manage them from one master node (laptop)
assuming you will be using it for learning purpouses then why not. It is possible, but just be aware that when master node goes down (e.g. when you turn off your laptop), all cluster goes down (or at least api-server communication so you wont be able to change cluster's state). Also make sure you are using bridge networking interface for your VM so it is visible in your local network as a standalone instance.
and do you have any suggestions about the way to achieve this setup.
installing k3s on all nodes would be the easiest in your case. There are plenty of resources on the internet explaining how to achieve it.
One last thing I would like to explain is the thing with updates.
Speaking of kubernetes updates you need to know that kubernetes doesn't update itself automatically. You need to explicitly update it. New k8s version is beeing released every 3 months that sometimes "breaks" things and backward compatibility is not possible (so always read changelog before updating stuff because rollbacks may not be possible unless you backed up an etcd cluster earlier).
Speaking of updating applications - To run your app all you do is send yaml files describing your application to k8s and it handles the rest. So if you want to update your app just update the tag on container image to newer version and k8s will handle the updates. Read here more about update strategies in k8s.

docker stack with overlay network & name resolution

I'm totally new to docker and started yesterday to do some tutorials. I want to build a small test application consisting of several different services (replicated and so on) that interact with each other and encountered a problem regarding 'service-discovery'. I started with the get-started tutorials on docker.com and at the moment i'm not really sure what's best practice in the world of docker to let the different containers in a network get to know each other...
As this is a rather vague 'problem description', i try to make this more precise. I want to use a few independent services (e.g. with stuff like postgre, mongodb, redis and rabbitmq...) together with a set of worker nodes to which work is assigned by a dedicated master node. Since it seems to be quite convenient, I wanted to use a docker-composer.yml file to define all my services and deploy them as a stack.
Moreover, I created a custom network and since it seems not to be possible to attach a stacked service to a bridge network, I created an attachable overlay network.
To finally get to the point: even though the services are deployed correctly, their actual container-name is random and without using somekind of service registry I'm not able to resolve their addresses.
A simple solution would be to use single containers with fixed container names - however this does not seem to be a best practice solution (even though it is actually just a docker-based DNS that is based on container names rather than domain names). Another problem are the randomly generated container names that contain underscores, and hence these names are not valid addresses that can be resolved...
best regards
Have you looked at something like Kubernetes? To quote from the home page:
It groups containers that make up an application into logical units for easy management and discovery.

How can I deploy a crate cluster on Giant Swarm?

I have been trying to set up a working crate cluster on Giant Swarm for quite a while now but haven't been really successfull so far.
Here are my attempts so far:
Using multicast and deploying crate as a single component. This works if all instances of this component end up on the same host, unfortunately this isn't reliable.
Using unicast with two components, which each expose port 4300 via url. This results in messages being received by each component in the discovery interval ( every 30 secounds by default ). Unfortunately Giant Swarm only supports HTTP on its urls, so all messages that are error messages from something other than HTTP being sent by the component.
Using unicast with two components and trying to discover them via their IPs. I set up Giant Swarm dependencies from one component to the other (circular dependencies are not supported). I can't get this to work because Giant Swarm doesn't allow me to run scripts before the docker container is created ( which is used in this blog post to run crate on the google cloud platform ) and docker does not support bidirectional linking.
I am out of ideas at this point, is there something I am missing about either crate or Giant Swarm? The only example I saw so far of something similar working was the blog post I linked above and it uses a mechanic I cannot use on Giant Swarm.
I would appreciate any kind of input on how to make this work or ideas leading in the right direction.
The current (May 2015) answer is: On a private Giant Swarm cluster, which we provide to customers on request, we support Multicasting. So the road is paved there for Crate.IO clustering.
We use Weave for the networking part.
Edit September 2015: We just published a blog post explaining how to set up a Crate cluster on Giant Swarm.
I'm sorry to say that it is not possible to deploy a Crate cluster (>1 node) on Giant Swarm at the moment (due to reasons you've mentioned). We (Crate.IO) are already in contact with Giant Swarm regarding that.

Linked Docker Containers with Mesos/Marathon

I'm having great success so far using Mesos, Marathon, and Docker to manage a fleet of servers, and the containers I'm placing on them. However, I'd now like to go a bit further and start doing things like automatically linking an haproxy container to each main docker service that starts, or provide other daemon based and containerized services that are linked and only available to the single parent container.
Normally, I'd start up the helper service first with some name, then when I started the real service, I'd link it to the helper and everything would be fine. How does this model fit in to Marathon and Mesos though? It seems for now at least that the containerization assumes a single container.
I had one idea to start the helper service first, on whatever host it could find, then add a constraint to the real service that the hostname = helper service's hostname, but that seems like it'd cause issues with resource offers and race conditions for those resources.
I've also thought to provide an "embed", or "deep-link" functionality to docker, or to the executor scripts that start the docker containers.
Before I head down any of these paths, I wanted to find out if someone else had solved this problem, or if I was just horribly over thinking things.
Thanks!
you're wandering in uncharted territory! ☺
There are multiple approaches here; and none of them is perfect, but the situation will improve in future versions of Docker, thanks to orchestration hooks.
One way is to use good old service discovery and registration. I.E., when a service starts, it will figure out its publicly available address, and register itself in e.g. Zookeeper, Etcd, or even Redis. Since it's not trivial for a service to figure out its publicly available address (unless you adopt some conventions, e.g. always mapping port X:X instead of letting Docker assing random ports), you might want to do the registration from outside. That means that your orchestration layer (Mesos in that case) would start the container, then figure out the host and port, and put that in your service discovery system. I'm not extremely familiar with Marathon, but you should be able to register a hook for that. Then, other containers will just look up the endpoint address in the service discovery registry, plain and simple.
You could also look at Skydock, which automatically registers DNS names for your containers with Skydns. However, it's currently single-host, so if you like that idea, you'll have to extend it somehow to support multiple hosts, and maybe SRV records.
Another approach is to use "well-known entry points". This is actually is simplified case of service discovery. It means that you will make sure that your services will always run on pre-set hosts and ports, so that you can use those addresses statically. Of course, this is bad (because it will make your life harder when you will want to reproduce the environment for testing/staging purposes), but if you have no clue at all about service discovery, well, it could be a start.
You could also use Pipework to create one (or multiple) virtual network spanning across multiple hosts, and binding your containers together. Pipework will let you assign IP addresses manually, or automatically through DHCP. This approach is not recommended, though, but it's a good fit if you also want to plug your containers into an existing network architecture (e.g. VLANs...).
No matter which solution you decide to use, I highly recommend to "pretend" that you're using links. I.e. instead of hard-coding your app configuration to connect to (random example) my-postgresql-db:5432, use environment variables DB_PORT_5432_TCP_ADDR and DB_PORT_5432_TCP_PORT (as if it were a link), and set those variables when starting the container. That way, if you "fold down" your containers into a simpler environment without service discovery etc., you can easily fallback on links without efforts.

Resources