Limit starting containers after complete shutdown - docker

We have a bare metal Docker Swarm cluster, with a lot of containers.
And recently we have a full stop on the physical server.
The main problem, happened on Docker startup where all container tried to start on the same time.
I would like to know if there is a way to limit the amount of starting container?
Or if there is another way to avoid overloading the physical server.

At present, I'm not aware of an ability to limit how fast swarm mode will start containers. There is a todo entry to add an exponential backoff in the code and various open issues in swarmkit, e.g. 1201 that may eventually help with this scenario. Ideally, you would have an HA cluster with nodes spread in different AZ's, and when one node fails, the workload would migrate to another node and you do not end up with one overloaded node.
What you can use are resource constraints. You can configure each service with a minimum CPU and memory reservation. This would prevent swarm mode from scheduling more containers on a node than it could handle during a significant outage. The downside is that some services may go unscheduled during an outage and you cannot prioritize which are more important to schedule.

Related

Unsure on how to Orchestrate docker containers

Im new to docker and am wanting to accomplish something but I am unsure on how to Orchestrate my docker containers to do this.
What I want to do:
I have an API that in simple does a calculation from a requested file. It loads the file (around 80mb) from disk to memory then keep it in memory for 2 hours (caching).
Im wanting to have an architecture where for example when the container gets overwhelmed with requests a new one fires up, and when the original container frees its memory and the requests slow down then the container shuts down.
Is Memory and CPU Container Orchestration possible?
Thank You,
/Jeremy
Docker itself is not dedicated to the orchestration multiple containers. You need to use some container orchestration environment. The most popular are Kubernetes, Docker Swarm, and Apache Mesos. Or if you want to run in the Cloud, then some vendor-specific, like AWS ECS.
Here's a good list of container clustering toolkit.
In all these environments it's possible to configure what you described. If you're completely new to the topic, then I recommend installing Docker-for-Desktop which comes with built-in Kubernetes and play with that in your local.
For sure, container orchestration system is what you want to be able efficiently manage your docker containers.
You can find current complete list of solutions for production environment in this spreadsheet
Tools, like kubernetes will give you reach set of benefits eg
Provisioning and deployment of containers
Redundancy and availability of containers
Scaling up or removing containers to spread application load evenly
across host infrastructure
Allocation of resources between containers
Load balancing of service discovery between containers
Health monitoring of containers and hosts
In Kubernetes there is a Horizontal Pod Autoscaler, that
automatically scales the number of pods in a replication controller,
deployment, replica set or stateful set based on observed CPU
utilization (or, with custom metrics support, on some other
application-provided metrics). Note that Horizontal Pod Autoscaling
does not apply to objects that can’t be scaled, for example,
DaemonSets.
As for beginning I would recommend you start with minikube.
More advanced ways are setup manually cluster using kubeadm either look into the cloud providers
Please be aware that you will not have option to modify cloud based control plane. More info in my related answer

Does it make sense to run multiple similar processes in a container?

a brief background to give context on the question.
Currently my team and i are in the midst of migrating our microservices to k8s to lessen the effort of having to maintain multiple deployment tools & pipelines.
One of the microservices that we are planning to migrate is an ETL worker that listens to messages on SQS and performs multi-stage processing.
It is built using PHP Laravel and we use supervisord to control how many processes to run on each worker instance on aws ec2. Each process basically executes a laravel command to poll different queues for new messages. We also periodically adjust the number of processes to maximize utilization of each instance's compute power.
So the questions are:
is this method of deployment still feasible when moving to k8s? Is there still a need to "maximize" compute usage? Are we better off just running 1 process in each container using the "container way" (not sure what is the tool called. runit?)
i read from multiple sources (e.g https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container) that it is ideal that for a container to run only 1 process. There's also the case of recovering crashed processes and how running supervisord might interfere with how container performs recovery. But i am not very sure if it applies for our use case.
You should absolutely restructure this to run one process per container and one container per pod. You do not typically need an init system or a process manager like supervisord or runit (there is an argument to have a dedicated init like tini that can do the special pid-1 things).
You mention two concerns here, restarting failed processes and process placement in the cluster. For both of these, Kubernetes handles these automatically for you.
If the main process in a Pod fails, Kubernetes will restart it. You don't need to do anything for this. If it fails repeatedly, it will start delaying the restarts. This functionality only works if the main process fails – if your container's main process is a supervisor process, you will never get a pod restart and you may not directly notice if a process can't start up at all.
Typically you'll run containers via Deployments that have some number of identical replica Pods. Kubernetes itself takes responsibility for deciding which node will run each pod; you don't need to manually specify this. The smaller the pods are, the easier it is to place them. Since you're controlling the number of replicas of a pod, you also want to separate concerns like Web servers vs. queue workers so you can scale these independently.
Kubernetes has some ability to auto-scale, though the typical direction is to size the cluster based on the workload: in a cloud-oriented setup if you add a new pod that requests more CPUs than your cluster currently has available, it will provision a new node. The HorizonalPodAutoscaler is something of an advanced setup, but you can configure it so that the number of workers is a function of your queue length. Again, this works better if the only thing it's scaling is the worker pods, and not a collection of unrelated things packaged together.

Is there a best practice to reboot a cluster

I followed Alex Ellis' excellent tutorial that uses kubeadm to spin-up a K8s cluster on Raspberry Pis. It's unclear to me what the best practice is when I wish to power-cycle the Pis.
I suspect sudo systemctl reboot is going to result in problems. I'd prefer not to delete and recreate the cluster each time starting with kubeadm reset.
Is there a way that I can shutdown and restart the machines without deleting the cluster?
Thanks!
This question is quite old but I imagine others may eventually stumble upon it so I thought I would provide a quick answer because there is, in fact, a best practice around this operation.
The first thing that you're going to want to ensure is that you have a highly available cluster. This consists of at least 3 masters and 3 worker nodes. Why 3? This is so that at any given time they can always form a quorum for eventual consistency.
Now that you have an HA Kubernetes cluster, you're going to have to go through every single one of your application manifests and ensure that you have specified Resource Requests and Limitations. This is so that you can ensure that a pod will never be scheduled on a pod without the required resources. Furthermore, in the event that a pod has a bug that causes it to consume a highly abnormal amount of resources, the limitation will prevent it from taking down your cluster.
Now that that is out of the way, you can begin the process of rebooting the cluster. The first thing you're going to do is reboot your masters. So run kubectl drain $MASTER against one of your (at least) three masters. The API Server will now reject any scheduling attempts and immediately start the process of evicting any scheduled pods and migrating their workloads to your other masters.
Use kubectl describe node $MASTER to monitor the node until all pods have been removed. Now you can safely connect to it and reboot it. Once it has come back up, you can now run kubectl uncordon $MASTER and the API Server will once again begin scheduling Pods to it. Once again use kubectl describe $NODE until you have confirmed that all pods are READY.
Repeat this process for all of the masters. After the masters have been rebooted, you can safely repeat this process for all three (or more) worker nodes. If you properly perform this operation you can ensure that all of your applications will maintain 100% availability provided they are using multiple pods per service and have proper Deployment Strategy configured.

how to schedule job with the monitoring of cpu,memory, disk io, etc.

My problem is I have a dedicate server, but the resources are still limited, i.e. IO, memory, CPU, etc.
I need to run a lot of jobs every day. Some jobs are io intensive some jobs are computation intensive. Is there a way to monitor the current status and decide when to start a new job from my job pool or not.
For example, when it knows the current running job are io intensive, it can lunch a job which do not relay on much of io. Or it can choose a running job which use a lot of disk io, stop it, re-schedule it later.
I come up with the solution with docker,since it can monitor the process, but I do not know such kind of scheduler build on top of docker.
Thanks
You can check the docker stats command in order to get basic metrics on what is running in the containers managed by a docker daemon.
You cannot exactly assign a job to a node depending on its dynamic behavior. That would mean to know in advance what type of resource a job will use. Which is not described in docker at all.
Docker provides a way to tag nodes which enable swarm filers, which would enable a cluster manager like swarm to select the right node based on criteria represented by a tag.
But Docker doesn't know about the "job" about to be launched.
Depending on the Docker version you're on, you have a number of options for prod. You can use the native Docker Swarm (just went GA in v1.9), you can give the more mature Kubernetes a try or HashiCorp's Nomad (early days) and there's of course Apache Mesos+Marathon. See also this comparison for more info on the topic.

CoreOS/etcd cluster minimum hosts

I'm evaluating a strategy for implementing docker for a small company with 2 servers. We wanted to have them both working as a cluster, to load balance the work, but to work as a fail-safe for one another in case of failure.
From what I understand, etcd requires a minimum of 3 up hosts or you lose the ability to put/get keys. That would not be possible with 2 machines, and with 3 machines none could fail. Is this assessment correct?
The only solution would be to have a single etcd but that would mean that if the machine that failed was the "etcd"-one then both would stop working correctly...
Just to clarify, I wanted the benefits of something like fleetd's scheduling and clustering abilities but with a small sized deploy. Moving containers/systemd-units and data manually between hosts is my backup plan, but less than ideal.
You can run coreos with only 2 hosts, however you will lose your etcd cluster once you don't have a quorum, with only 2 machines this is possible if both are rebooted. With 3 hosts, you have a much higher likelihood of having a quorum if all machines are rebooted.
If you are willing to have one always be considered master, you can do this, you just have to be sure that you understand how to make an etcd peer consider itself master if quorum is lost.
If you have static IP's, then you have more control over your cluster and should be fine with setting the cluster IP's then even if both servers are restarted, they should be able to discover each other and reach a stable state.
Take a looks at the docs

Resources