We're running spring-cloud microservices using eureka on AWS ECS. We're also doing continuous deployment, and we've run into an issue where rolling production deployments cause a short window of service unavailability. I'm focusing here on #LoadBalanced RestTemplate clients using ribbon. I think I've gotten retry working adequately in my local testing environment, but I'm concerned about new service instance eureka registration lag time and the way ECS rolling deployments work.
When we merge a new commit to master, if the build passes (compiles and tests pass) our jenkins pipeline builds and pushes a new docker image to ECR, then creates a new ECS task definition revision pointing to the updated docker image, and updates the ECS service. As an example, we have an ECS service definition with desired task count set to 2, minimum percent available set to 100%, and maximum percent available set to 200%. The ECS service scheduler starts 2 new docker containers using the new image, leaving the existing 2 docker container running on the old image. We use container health checks that pass once the actuator health endpoint returns 200, and as soon as that happens, the ECS service scheduler stops the 2 old containers running on the old docker image.
My understanding here could be incorrect, so please correct me if I'm wrong about any of this. Eureka clients fetch the registry every 30 seconds, so there's up to 30 seconds where all the client has in the server list is the old service instances, so retry won't help there.
I asked AWS support about how to delay ECS task termination during rolling deploys. When ECS services are associated with an ALB target group, there's a deregistration delay setting that ECS respects, but no such option exists when a load balancer is not involved. The AWS response was to run the java application via an entrypoint bash script like this:
#!/bin/bash
cleanup() {
date
echo "Received SIGINT, sleeping for 45 seconds"
sleep 45
date
echo "Killing child process"
kill -- -$$
}
trap 'cleanup' SIGTERM
"${#}" &
wait $!
When ECS terminates the old instances, it send SIGTERM to the docker container, this script traps it, sleeps for 45 seconds, then continues with the shutdown. I'll also have to change an ecs config parameter in /etc/ecs that controls the grace period before ECS sends a SIGKILL after the SIGTERM, which defaults to 30 seconds, which is not quite long enough.
This feels dirty to me. I'm not sure that script isn't going to cause some other unforeseen issue; does it forward all signals appropriately? It feels like an unwanted complication.
Am I missing something? Can anyone spot anything wrong with AWS support's suggested entrypoint script approach? Is there a better way to handle this and achieve the desired result, which is zero downtime rolling deployments on services registered in eureka on ECS?
Related
Currently I'm researching on how our dockerised microservices could be orchestrated on AWS.
The Fargate option of ECS looks promising eliminating the need of managing EC2 instances.
Although it's a surprisingly long time needed to start a "task" in Fargate, even for a simple one-container setup. A 60 seconds to 90 seconds are typical for our Docker app images. And I heard it may take even more time like minutes or so.
So the question is: while Docker containers typically may start in say seconds what is exactly a reason for such an overhead in Fargate case?
P.S. The search on related questions returns such options:
Docker image load/extract time
Load Balancer influence -
registering, healthchecks grace period etc
But even in simplest possible config with no Load Balancer deployed and assuming the Docker image is not cached in ECS, it is still at least ~2 times slower to start task with single Docker image in Fargate (~ 60 sec) than launch the same Docker image on bare EC2 instance (25 sec)
Yes takes a little longer but we can't generalize the startup time for fargate. You can reduce this time tweaking some settings.
vCPU is directly impacting the start up time, So you have to keep in mind that in bare EC2 instance you have complete vCPU at your disposal , while in cases of fargate you may be assigning portion of it.
Since AWS manages servers for you they have to do few underline things. Assigning the VM into your VPC to docker images download/extract, assigning IPs and running the container can take this much time.
It's a nice blog and at the end of following article you can find good practices.
Analyzing AWS Fargate
Problem domain
Imagine that a stateful container is being managed by Swarm, e.g. a database, and another container is relying on it, e.g. a service that is executing a long-running job (minutes, sometimes hours) that does not tolerate the database (or even itself) to go down while it's executing.
To give an example, a database importing a multi GB dump.
There's also a CI/CD system in place which takes care of building new versions of the containers and deploying them to the Swarm, or pushing the image to Docker Hub which then calls a defined webhook which fires off the deployment event.
Question
Is there any way I can build my containers so that Swarm can know whether it's ok to update it or not? Similarly how HEALTHCHECK reports whether it needs to be restarted, something that would let Swarm know that 'It's safe to restart this container now'.
Or is it the CI/CD system's responsibility to check whether the stateful containers are safe to restart, and only then issue the update command to swarm?
Thanks in advance!
Docker will not check with a container if it is ready to be stopped, once you give docker the command to stop a container it will perform that action. However it performs the stop in two steps. The first step is a SIGTERM that your container can trap and gracefully handle. By default, after 10 seconds, a SIGKILL is sent that the Linux kernel immediately applies and cannot be trapped by the container. For your goals, you'll want to make sure your app knows when it's safe to exit after receiving the first signal, and you'll probably want to extend the time to much longer than 10 seconds between signals.
The healthcheck won't tell docker that your container is at a safe point to stop. It does tell swarm when your container has finished starting, or when it's misbehaving and needs to be stopped and replaced. The healthcheck defines a command to run inside your container, and the exit code is checked for whether it's 0 (healthy) or 1 (unhealthy). No other exit codes are currently valid.
If you need more than the simple signal handling inside the container, then yes, you're likely moving up the stack to a ci/cd tool to manage the deployment.
Using simple server
I was using a simple node (centos or ubuntu) to run my web application and also configured some cron jobs there to run schedule tasks. In that moment everything worked.
Using Docker Swarm Cluster
I migrated my application to Docker Swarm cluster. Now the crons are running in multiple containers at same time and that is critical for me. I know Docker is working on new feature called jobs but I need a solution for now. I will like to know if there is any way to just run one kind of cron job process.
Blocker
The crons are running tasks like:
create report about process.
send notification to another services.
updating data in the application.
The crons need to be run on the server because were configured to use interfaces and endpoint using php command.
My Problem
I created multiple instance of the same docker service to provide availability. All the instances are running in a cluster of 3 nodes and each of them are running its cron jobs at same time in parallel and I will like to run just one job per docker service.
Maybe it would be a solution to create periodically a docker service with restart condition none and replicas 1 or create a cron container with replicas 1 and restart condition any, it would be the scheduler and attach a volume with the required cron scripts.
There are multiple options.
Use a locking mechanism locking over NFS, or a database (MySQL, Redis, etc). You execute each job like this: /usr/local/bin/locker /path/to/script args. It may good to provide the locker options to wait for the lock or fail immediately if the lock is not available (blocking, non-blocking). Therefore, if the job is long-running, only the first one will acquire the lock, and others will fail. You may want to reuse some existing software simplifying the hard job of creating reliable locks.
Use a leader selection. When running in a swarm, there must be a mechanism to query the list of containers. List only cron containers, and sort them alphabetically. If the current container's id is the first one, then allow executing. first=$(get-containers cron | sort | head -n 1); if [[ "$current-id" == "$first" ]]; then ... fi
Run the cron outside of the cluster but use it to trigger jobs within the cluster over the load balancer. The load balancer would pick exactly one container to execute the job. For example curl -H 'security-key: xxx' HTTP://the.cluster/my-job.
I'm sure there are swarm-specific tools/methods available. A link.
I have a swarm with 3 nodes. On it, I want to launch one service for a Database and then another, with some replicas that run a python application. The program will take approximately 30 minutes to finish. After that, the container is shut down and a new one starts. Sometimes, however, some problem occur and the container does not stop. Is there any option that I can use when I launch the service so that, after 1 hour, a container is automatically killed and a new one is created?
You can create an application using the Docker Remote API, that automatically creates that container, waits for one hour, deploys it to the swarm and then deletes that container. This is not a feature to look for in docker. You should manually implement it using Docker API.
You can find in here complete list of docker libraries to help you get started.
Kubernetes deployment and replication controller give the ability to self heal by ensuring a minimum number of replicas is/are present.
Also the auto scaling features, allows to increase replicas given a specific cpu threshold.
Are there tools available that would provide flexibility in the auto-healing and auto-scale features?
Example :
Auto-adjust number of replicas during peak hours or days.
When the pod dies, and is due to external issues, prevent the system from re-creating container and wait for a condition to succeed, i.e. ping or telnet test.
You can block pod startup by waiting for external services in an entrypoint script or init container. That's the closest that exists today to waiting for external conditions.
There is no time based autoscaler today, although it would be possible to script it failure easily on a schedule.
In Openshift, you can easily scale your app by running this command in a cron job.
Scale command
oc scale dc app --replicas=5
And of course, scale it down changing the numer of replicas.
Autoscale
This is what Openshift for developers write about autoscaling.
OpenShift also supports automatic scaling, defining upper and lower thresholds for CPU usage by pod.
If the upper threshold is consistently exceeded by the running pods for your application, a new instance of your application will be started. When CPU usage drops back below the lower threshold, because your application is no longer working as hard, the number of instances will be scaled back again.
I think Kubernetes now released version 1.3 which allows autoscale but integrated yet in Openshift.
Health Check
What it comes to health check, Openshift has:
readiness checks Checks the status of the test you configure before the router start to send traffic to it.
liveness probe: liveness probe is run periodically once traffic has been switched to an instance of your application to ensure it is still behaving correctly. If the liveness probe fails, OpenShift will automatically shut down that instance of your application and replace it with a new one.
You can perform this kind of tests (HTTP check, Container execution check and TCP socket check)
So e this tolos I guess you can créate some readiness check and liveness check to ensure that the status of your pod is running properly, if not a new deployment will be triggered until readiness status comes to ok.