What are the limitations of docker swarm over kubernetes? - docker

I need to make a decision for container orchestration , and needed help in finding out limitation in real world scenarios that can occur using docker swarm over kubernetes, if anyone ever faced any such limitation please suggest.
The containers cluster may reach a value of approx 50-100 containers.

Docker swarm is young and there are a lot of features introduced relatively quickly. This however causes more issues and open "serious" bugs. For a production system that should be up 100% that might be an issue. I personally experienced a bug that made it impossible to start new containers because they were assigned an IP that is already taken. This forced me to shut down my swarm (it's a dev system so I didn't mind too much).
I suggest having a look at the most commented swarm bugs/issues in github.

Related

All docker stack are restarting automatically

I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.
After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.

Docker Swarm CPU overload on deploy with Spring Boot containers

I have created a number of Spring Boot application, which all work like magic in isolation or when started up one of the other manually.
My challenge is that I want to deploy a stack with all the services in a Docker Swarm.
Initially I didn't understand what was going on, as it seemed like all my containers were hanging.
Turns out running a single Spring Boot application spikes up my CPU utilization to max it out for a good couple of seconds (20s+ to start up).
Now the issue is that Docker Swarm is launching 10 of these containers simultaneously and my load average goes above 80 and the system grinds to a halt. The container HEALTHCHECKS starts timing out and eventually Docker restarts them. This is an endless cycle and may or may not stabilize and if it does stabilize it takes a minimum of 30 minutes. So much for micro services vs big fat Java EE applications :(
Is there any way to convince Docker to rollout the containers one by one? I'm sure this will help a lot.
There is a rolling update parameter - https://docs.docker.com/engine/swarm/swarm-tutorial/rolling-update/ - but is does not seem applicable to startup deployment.
Your help will be greatly appreciated.
I've also tried systemd (which isn't ideal for distributed micro services). It worked slightly better than Docker, but have the same issue when deploying all the applications at once.
Initially I wanted to try Kubernetes, but I've got enough on my plate and if I can get away with Docker Swarm, that would be awesome.
Thanks!

Is it best practice to daemonize a process within docker?

Many best practice guides emphasize making your process a daemon and having something watch it to restart in case of failure. This made sense for a while. A specific example can be sidekiq.
bundle exec sidekiq -d
However, with Docker as I build I've found myself simply executing the command, if the process stops or exits abruptly the entire docker container poofs and a new one is automatically spun up - basically the entire point of daemonizing a process and having something watch it (All STDOUT is sent to CloudWatch / Elasticsearch for monitoring).
I feel like this also tends to re-enforce the idea of a single process in a docker container, which if you daemonize would tend to in my opinion encourage a violation of that general standard.
Is there any best practice documentation on this even if you're running only a single process within the container?
You don't daemonize a process inside a container.
The -d is usually seen in the docker run -d command, using a detached (not daemonized) mode, where the the docker container would run in the background completely detached from your current shell.
For running multiple processes in a container, the background one would be a supervisor.
See "Use of Supervisor in docker" (or the more recent docker --init).
Some relevent 12 Factor app recommendations
An app is executed in the execution environment as one or more processes
Concurrency is implemented by running additional processes (rather than threads)
Website:
https://12factor.net/
Docker was open sourced by a PAAS operator (dotCloud) so it's entirely possible the authors were influenced by this architectural recommendation. Would explain why Docker is designed to normally run a single process.
The thing to remember here is that a Docker container is not a virtual machine, although it's entirely possible to make it quack like one. In practice a docker container is a jailed process running on the host server. Container orchestration engines like Kubernetes (Mesos, Docker Swarm mode) have features that will ensure containers stay running, replacing them should the need arise.
Remember my mention of duck vocalization? :-) If you want your container to run multiple processes then it's possible to run a supervisor process that keeps everything healthy and running inside (A container dies when all processes stop)
https://docs.docker.com/engine/admin/using_supervisord/
The ultimate expression of this VM envy would be LXD from Ubuntu, here an entire set of VM services get bootstrapped within LXC containers
https://www.ubuntu.com/cloud/lxd
In conclusion is it a best practice? I think there is no clear answer. Personally I'd say no for two reasons:
I'm fixated on deploying 12 factor compliant applications, so married to the single process model
If I need to run two processes on the same set of data, then in Kubernetes I can run containers within the same POD... Means Kubernetes manages the processes (running as separate containers with a common data volume).
Clearly my reasons are implementation specific.
There are multiple run supervisors that can help you take a foreground process (or multiple ones) run them monitored and restart them on failure (or exit the container).
one is runit (http://smarden.org/runit/), which I have not used myself.
my choice is S6 (http://skarnet.org/software/s6/). someone already built a container envelope for it, named S6-overlay (https://github.com/just-containers/s6-overlay) which is what I usually use if/when I need to have a user-space process run as daemon. it also has facets to do prep work on container start, change permissions and more, in runtime.
tl;dr: I can't find a best practices document that relates directly to this for docker, but I agree with you.
The only best "Best Practices" for docker I could find was at dockers own site, which states that containers should be one process. In my mind, that means foregrounded processes as well. So basically, I've drawn the same conclusion as you. (You've probably read that too, but this is for anyone else reading this).
Honestly, I think we are still in (relatively) new territory with best practices for docker. Anecdotally, it has been a best practice in the organizations I've worked with. The number of times I've felt more satisfied with a foregrounded process has been significantly greater then the times I've said to myself "Boy, I sure wish I backgrounded that one." In fact, I don't think I've ever said that.
The only exception I can think of is when you are trying to evaluate software and need a quick and dirty way to ship infrastructure off to someone. EG: "Hey, there is this new thing called LAMP stacks I just heard of, here is a docker container that has all the components for you to play around with". Again, though, that's an outlier and I would shudder if something like that ever made it to production or even any sort of serious development environment.
Additionally, it certainly forces a micro-architecture style, which I think is ultimately a good thing.

Anyone is using swarm docker as Production?

I'm just curious, how reliable swarm docker is ? Because I'm making decision to replace current production physical infrastructure to be a swarm docker but I'm not quite sure.
Please suggest me about swarm docker or any URL for instruction of swarm docker as a production environment.
Thanx.
There are two versions of Swarm.
The original Docker Swarm introduced in late 2014. It requires external components like Consul, Etcd, Registrator, load balancer, etc. It is being used in production. It is still supported but will probably be supplanted eventually by "swarm mode" (my guess).
The new "Swarm Mode" introduced in June 2016. This is a much easier version as it doesn't require external services. It is however very new and still maturing. It is starting to be used in production but you need to be keenly aware of its limitations.
Overall, both Swarm versions are currently being used in production but because they are relatively new, specially Swarm Mode, they are not as mature or extensive as alternatives like Kubernetes or Mesos. But then it's really a matter of what type of production system you need. Do you need a simple 3 node High Availability system or a fully scalable hundred node system? "Production system" is become too generic a term.

Networking among kubernetes minions

I installed an 8-node kubernetes cluster (1 master + 7 minion) but I faced a networking problem among minions.
I installed my cluster according to this step-by-step Fedora manual, so I use Fedora 20 with its testing repository to get kubernetes binaries.
After installing, I wanted to try the guestbook example, but it seems to me there is a problem with the inter-container networking.
Although containers/PODs are in running state and I can reach my 3 frontend containers (via browser) and the redis containers as well (via natcat), but the frontend, which not on the same host with the redis, cannot reach redis master. The frontend's PHP give back network exception.
Can anybody help me why the containers cannot reach each other among the hosts?
I hope I could describe my setup enough accurately and thanks in advance.
The Fedora guide you followed will only get you running on a single machine. It avoids the issues around setting up networking across nodes.
For kubernetes to work, the following network set up must be satisfied:
Every container should be able to talk to every other container, even across nodes. This means also that the bridge IP range for those containers must not overlap.
Code running on any node that isn't in a container should be able to reach every container (and vise-versa), even across nodes.
It is not necessary (but useful) if computers on the network that aren't part of the cluster can reach the containers directly.
There are a lot of ways to achieve this -- for instance the set up for vagrant sets up GRE tunnels between each node. On GCE we use features of the platform to do the routing. If you are on physical machines on a switch you can probably just do a big layer 2 network w/ bridges. A bulletproof way to get started (but perhaps not the most performant, depending on your set up) is to use something like flannel.
We are working on making this stuff easier to start up (without using a mess of shell scripts) and are thinking of building something like flannel in so that there is a reasonable default.

Resources