Docker compose and Fargate rollbacks - docker

I'm trying to improve my service by setting a rollback strategy in case my changes crash the container and keep the tasks exiting.
Context:
I have a simple service that I update changing the tag.
services
web:
image: AWS-Account-Id.dkr.ecr.us-east-1.amazonaws.com/my-service:1
environment:
- ENV=dev
ports: ["80:80"]
I make some change in the docker image, build, tag, and push it to ECR. Then update the tag to 2 (for example) and run docker compose up.
Let's say that I introduce an error and the container starts but then stops (due to the error) it will keep constantly trying to start and stop the container with error: Essential container in task exited
Is there a way in docker-compose to set a condition where if it tries to start the container web 2 times and the tasks fail to reach and maintain the status of running, rollback changes or do a cloudformation cancel update operation?
There is a load balancer that listens to port 80 and I also added a health check to the
healthcheck:
test: ["CMD", "curl", "-f", "/status"]
interval: 1m
timeout: 10s
retries: 2
start_period: 40s
But I cannot make it work. Tasks keep exiting and the cloudformation deployment keeps going.

This is no direct way for this, but you can consider this approach:
Create a Wait Condition and a WaitCondition Handle Resource.
Calibrate how long it usually takes for the Task / Container to start and set the timeout accordingly.
Configure the application to post a success signal to the endpoint URL on successful setup.
Ensure that the Service and the waitcondition handle start updating / creating in parallel.
If the time exceeds the timeout period, the wait condition handle will rollback.
Thing to consider: On every operation the waitcondition handle and the waitcondition resources will need to be re-created. Easy way to do that is to modify the logical id of the resources. There can be a parameters / template hash calculator that will add the hash as suffix to the wait condition resources. Thus, if there's a change in the parameters / template, the wait condition resources will be recreated automatically.

Related

can i limit the active service in a swarm to 1 instance?

i’m currently in the process of setting up a swarm with 5 machines. i’m just wondering if i can and should limit the swarm to only allow one active instance of a service? and all others just wait till they should jump in when the service fail.
This is to prevent potential concurrency problems with maria-db (as the nodes sill write to a nas), or connection limit to an external service (like node red with telegram)
If you're deploying with stack files you can set "replicas: 1" in the deploy section to make sure only one instance runs at a time.
If that instance fails (crashes or exits) docker will start another one.
https://docs.docker.com/compose/compose-file/deploy/#replicas
If the service is replicated (which is the default), replicas
specifies the number of containers that SHOULD be running at any given
time.
services: frontend:
image: awesome/webapp
deploy:
mode: replicated
replicas: 6
If you want multiple instances running and only one "active" hitting the database you'll have to coordinate that some other way.

How to delay docker swarm rolling updates with an initial sleep?

I'm using docker swarm to deploy an application as single node only, but for having zero downtime deployment.
Question: how could I tell docker swarm to wait for X seconds before switching the started container, and before taking the old container down?
I know I could add a custom healthcheck, but I'd like to to simply define a time interval that should block the container and give it some warmup time before it is taken live.
Or maybe some kind of initial sleep?
deploy:
replicas: 1
update_config:
order: start-first
failure_action: rollback
#this did not help. new container is going live instant!
delay: 10s

How did the docker service manage to call instance from a seperate docker container?

I have recently started using Docker+Celery. I have also shared the full sample codes for this example on github and the following are some code snippets from it to help explain my point.
For context, my example is designed to be a node that subscribes to events in a system of microservices. In this node, it comprises of the following services:
the Subscriber (using kombu to subscribe to events)
the Worker (using celery for async task acting on the events)
Redis (as message broker and result backend for celery)
The services are defined in a docker-compose.yml file as follows:
version: "3.7"
services:
# To launch the Subscriber (using Kombu incl. in Celery)
subscriber:
build: .
tty: true
#entrypoint: ...
# To launch Worker (Celery)
worker:
build: .
entrypoint: celery worker -A worker.celery_app --loglevel=info
depends_on:
- redis
redis:
image: redis
ports:
- 6379:6379
entrypoint: redis-server
For simplicity, I have left out codes for the subscriber and I thought using the python interactive shell in the subscriber container for this example should suffice:
python3
>>> from worker import add
>>> add.delay(2,3).get()
5
And in the worker container logs:
worker_1 | [2020-09-17 10:12:34,907: INFO/ForkPoolWorker-2] worker.add[573cff6c-f989-4d06-b652-96ae58d0a45a]: Adding 2 + 3, res: 5
worker_1 | [2020-09-17 10:12:34,919: INFO/ForkPoolWorker-2] Task worker.add[573cff6c-f989-4d06-b652-96ae58d0a45a] succeeded in 0.011764664999645902s: 5
While everything seems to be working, I felt uneasy. I thought this example doesn't respect the isolation principle of a docker container.
Aren't containers designed to be isolated to the level of it's OS,
processes and network? And if containers have to communicate, shouldn't it be done via IP address and network protocols (TCP/UDP etc.)
Firstly, the worker and subscriber run the same codebase in my example, thus no issue is expected on the import statement.
However, the celery worker is launched from the entrypoint in the worker container, thus, how did the subscriber manage to call the celery worker instance in the supposedly isolated worker container?
To further verify that it is in fact calling the celery worker instance from the worker container, I stopped the worker container and repeated the python interactive shell example in the subscriber container. The request waited (which is expected of celery) and returned the same result as soon as the worker container is turned back on again. So IMO, yes, service from one container is calling an app instance from another container WITHOUT networking like in the case of connecting to redis (using IP address etc.).
Pls advise if my understanding is incorrect or there may be a wrong implementation somewhere which I am not aware of.
Both consumer (worker) and producer (subsriber) are configured to use Redis (redis) both as broker and result backend. That is why it all worked. - When you executed add.delay(2,3).get() in the subscriber container it sent the task to Redis, and it got picked by the Celery worker running in a different container.
Keep in mind that Python process running the add.delay(2,3).get() code is running in the subscriber container, while the ForkPoolWorker-2 process that executed the add() function and stored the result in the result backend is running in the worker container. These processes are completely independent.
The subscriber process did not call anything in the worker container! - In plain English what it did was: "here (in Redis) is what I need done, please workers do it and let me know you are done so that I can fetch the result".
Docker-compose creates a default docker network for containers created in a single file. Since you are pointing everything appropriately, it is making the requests along that network, which is why that is succeeding. I would be surprised to hear that this still worked if you were to, for example, run each container separately in parallel without using docker-compose.

How to perform Health check in a docker console app

I have a .net 2.2 core console app that is running permanently as a HostedService in a docker container. It is not using ASPNet.Core and it's not an API, it's a pure .net-core application.
When started, it performs a number of healthchecks and when starts running.
I'm trying to write a cake build script and to deploy in docker, and run some mocha integration tests. However, the integration step doesn't wait for the console app to complete its health checks and the integration fails.
All the health check guides I read are for an API. I don't have an API and there are no endpoints. How can I make docker wait for the console app to be ready?
Thanks in advance
You can use the docker-compose tool and, in particular, its healthcheck option to wait for a container to spin up and do its setup work before proceeding with the integration tests.
You can add a healthcheck block to your app service definition:
myapp:
image: ...
healthcheck:
test:
["CMD", "somescript.sh", "--someswitch", "someparam"]
interval: 30s
timeout: 10s
retries: 4
The command you specify in the test property should verify whether the setup work of the app is completely done. It can be some custom script that is added to myapp container just for the sake of its health check.
Now, you can add some fake service - just to check the app service is healthy. Depending on your particular case, this responsibility can be handed to the container with integration tests. Anyway, this other service should have a depends_on block and mention the app service in it:
wait-for-app-setup:
image: ...
depends_on:
myapp:
condition: service_healthy
As a result, docker-compose will take care of the correct order and proper wait periods for the stuff to start and set up correctly.

Make docker image as base for another image

Now I have built a simple GET API to access this database: https://github.com/ghusta/docker-postgres-world-db
This API will get a country code and get the full record of the country of this country from the database.
The structure is that the API is in a separate docker image, and the database is in another one.
So once the API's image tries to start, I need it to start the database's image before and then start running itself upon the database's image.
So how to do that?
You can use Docker Compose, specifically the depends_on directive. This will cause Docker to start all dependencies before starting an image.
Unfortunately there is no way to make it wait for the dependency to go live before starting any dependents. You'll have to manage that yourself with a wait script or similar.
A most probable solution would be to use docker compose along with a third party script.
For Example your docker compose file might look like:
services:
web:
build: .
ports:
- "80:8000"
depends_on:
- "db"
command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
db:
image: postgres
Where ./wait-for-it.sh is a third party script you can get from
https://github.com/vishnubob/wait-for-it
You can also use this script from
https://github.com/Eficode/wait-for
I would recommend to tweak the script as per your needs if you want to(i did that).
P.S:
The problem of waiting for a database (for example) to be ready is really just a subset of a much larger problem of distributed systems. In production, your database could become unavailable or move hosts at any time. Your application needs to be resilient to these types of failures.
To handle this, design your application to attempt to re-establish a connection to the database after a failure. If the application retries the connection, it can eventually connect to the database.
The best solution is to perform this check in your application code, both at startup and whenever a connection is lost for any reason

Resources