How to perform Health check in a docker console app - docker

I have a .net 2.2 core console app that is running permanently as a HostedService in a docker container. It is not using ASPNet.Core and it's not an API, it's a pure .net-core application.
When started, it performs a number of healthchecks and when starts running.
I'm trying to write a cake build script and to deploy in docker, and run some mocha integration tests. However, the integration step doesn't wait for the console app to complete its health checks and the integration fails.
All the health check guides I read are for an API. I don't have an API and there are no endpoints. How can I make docker wait for the console app to be ready?
Thanks in advance

You can use the docker-compose tool and, in particular, its healthcheck option to wait for a container to spin up and do its setup work before proceeding with the integration tests.
You can add a healthcheck block to your app service definition:
myapp:
image: ...
healthcheck:
test:
["CMD", "somescript.sh", "--someswitch", "someparam"]
interval: 30s
timeout: 10s
retries: 4
The command you specify in the test property should verify whether the setup work of the app is completely done. It can be some custom script that is added to myapp container just for the sake of its health check.
Now, you can add some fake service - just to check the app service is healthy. Depending on your particular case, this responsibility can be handed to the container with integration tests. Anyway, this other service should have a depends_on block and mention the app service in it:
wait-for-app-setup:
image: ...
depends_on:
myapp:
condition: service_healthy
As a result, docker-compose will take care of the correct order and proper wait periods for the stuff to start and set up correctly.

Related

Gitlab-runner + Docker-compose deploying scheme: how to properly restart containers after reboot of host server

Suppose I have repository on Gitlab and following deploying scheme:
Setup docker and gitlab-runner with docker executor on host server.
In .gitlab-ci.yml setup docker-compose to build and up my service together with dependencies.
Setup pipeline to be triggering by pushing commits to production branch.
Suppose docker-compose.yml has two services: app (with restart: always) and db (without restarting rule). app depends on db so docker-compose up starts db and then app.
It works perfectly until host server reboots. After it is only app container restarts.
Workarounds I've found and their cons:
add restart: always to db service. But app can start before db and hence fails.
use docker-compose on host machine and setup docker-compose up to autorun. But in that case I should setup docker-compose, deploy ssh-keys, clone code somewhere to the host server and update it. It seems like violating DRY principle and overcomplicating scheme.
trigger pipleline after reboot. The only way I've found is to trigger it by API and trigger token. But in that case I have to setup trigger token which seems like not as bad as before but violating DRY principle and overcomplicating scheme.
How can one improve deploying scheme to make docker restart containers after reboot in right order.
P.S. Configs are as following:
.gitlab-ci.yml:
image:
name: docker/compose:latest
services:
- docker:dind
stages:
- deploy
deploy:
stage: deploy
only:
- production
script:
- docker image prune -f
- docker-compose build --no-cache
- docker-compose up -d
docker-compose.yml:
version: "3.8"
services:
app:
build: .
container_name: app
depends_on:
- db
ports:
- "80:80"
restart: always
db:
image: postgres
container_name: db
ports:
- "5432:5432"
When you add restart: always to db service, your app can start before db and fails. But your app must restart after fails, becase "restart:always" policy, if it doesn't work probably you have wrong exit code from failed app.
So You can add healthcheck and restart app after delay, which you suppose app must work.
Simple check of 80 port can help.
It is basically happen because you are failing fast in your app due to an unavailable database.
It can be useful in some cases, but for your use case, you can implement the app in a way that you retry to establish the connection if it fails. Ideally a backoff strategy could be implemented so that you don't overload your database in case of a real issue.
Losing the connection to the database can happen, but does it make sense to kill your app if the database is unavailable? Can you implement any fallback e.g. "Sorry we have an issue but we are working on it". In a user perspective letting them know that you have an issue and you are working to fix it, has a much better user experience than just don't open your app.

Docker compose and Fargate rollbacks

I'm trying to improve my service by setting a rollback strategy in case my changes crash the container and keep the tasks exiting.
Context:
I have a simple service that I update changing the tag.
services
web:
image: AWS-Account-Id.dkr.ecr.us-east-1.amazonaws.com/my-service:1
environment:
- ENV=dev
ports: ["80:80"]
I make some change in the docker image, build, tag, and push it to ECR. Then update the tag to 2 (for example) and run docker compose up.
Let's say that I introduce an error and the container starts but then stops (due to the error) it will keep constantly trying to start and stop the container with error: Essential container in task exited
Is there a way in docker-compose to set a condition where if it tries to start the container web 2 times and the tasks fail to reach and maintain the status of running, rollback changes or do a cloudformation cancel update operation?
There is a load balancer that listens to port 80 and I also added a health check to the
healthcheck:
test: ["CMD", "curl", "-f", "/status"]
interval: 1m
timeout: 10s
retries: 2
start_period: 40s
But I cannot make it work. Tasks keep exiting and the cloudformation deployment keeps going.
This is no direct way for this, but you can consider this approach:
Create a Wait Condition and a WaitCondition Handle Resource.
Calibrate how long it usually takes for the Task / Container to start and set the timeout accordingly.
Configure the application to post a success signal to the endpoint URL on successful setup.
Ensure that the Service and the waitcondition handle start updating / creating in parallel.
If the time exceeds the timeout period, the wait condition handle will rollback.
Thing to consider: On every operation the waitcondition handle and the waitcondition resources will need to be re-created. Easy way to do that is to modify the logical id of the resources. There can be a parameters / template hash calculator that will add the hash as suffix to the wait condition resources. Thus, if there's a change in the parameters / template, the wait condition resources will be recreated automatically.

How did the docker service manage to call instance from a seperate docker container?

I have recently started using Docker+Celery. I have also shared the full sample codes for this example on github and the following are some code snippets from it to help explain my point.
For context, my example is designed to be a node that subscribes to events in a system of microservices. In this node, it comprises of the following services:
the Subscriber (using kombu to subscribe to events)
the Worker (using celery for async task acting on the events)
Redis (as message broker and result backend for celery)
The services are defined in a docker-compose.yml file as follows:
version: "3.7"
services:
# To launch the Subscriber (using Kombu incl. in Celery)
subscriber:
build: .
tty: true
#entrypoint: ...
# To launch Worker (Celery)
worker:
build: .
entrypoint: celery worker -A worker.celery_app --loglevel=info
depends_on:
- redis
redis:
image: redis
ports:
- 6379:6379
entrypoint: redis-server
For simplicity, I have left out codes for the subscriber and I thought using the python interactive shell in the subscriber container for this example should suffice:
python3
>>> from worker import add
>>> add.delay(2,3).get()
5
And in the worker container logs:
worker_1 | [2020-09-17 10:12:34,907: INFO/ForkPoolWorker-2] worker.add[573cff6c-f989-4d06-b652-96ae58d0a45a]: Adding 2 + 3, res: 5
worker_1 | [2020-09-17 10:12:34,919: INFO/ForkPoolWorker-2] Task worker.add[573cff6c-f989-4d06-b652-96ae58d0a45a] succeeded in 0.011764664999645902s: 5
While everything seems to be working, I felt uneasy. I thought this example doesn't respect the isolation principle of a docker container.
Aren't containers designed to be isolated to the level of it's OS,
processes and network? And if containers have to communicate, shouldn't it be done via IP address and network protocols (TCP/UDP etc.)
Firstly, the worker and subscriber run the same codebase in my example, thus no issue is expected on the import statement.
However, the celery worker is launched from the entrypoint in the worker container, thus, how did the subscriber manage to call the celery worker instance in the supposedly isolated worker container?
To further verify that it is in fact calling the celery worker instance from the worker container, I stopped the worker container and repeated the python interactive shell example in the subscriber container. The request waited (which is expected of celery) and returned the same result as soon as the worker container is turned back on again. So IMO, yes, service from one container is calling an app instance from another container WITHOUT networking like in the case of connecting to redis (using IP address etc.).
Pls advise if my understanding is incorrect or there may be a wrong implementation somewhere which I am not aware of.
Both consumer (worker) and producer (subsriber) are configured to use Redis (redis) both as broker and result backend. That is why it all worked. - When you executed add.delay(2,3).get() in the subscriber container it sent the task to Redis, and it got picked by the Celery worker running in a different container.
Keep in mind that Python process running the add.delay(2,3).get() code is running in the subscriber container, while the ForkPoolWorker-2 process that executed the add() function and stored the result in the result backend is running in the worker container. These processes are completely independent.
The subscriber process did not call anything in the worker container! - In plain English what it did was: "here (in Redis) is what I need done, please workers do it and let me know you are done so that I can fetch the result".
Docker-compose creates a default docker network for containers created in a single file. Since you are pointing everything appropriately, it is making the requests along that network, which is why that is succeeding. I would be surprised to hear that this still worked if you were to, for example, run each container separately in parallel without using docker-compose.

How can I start one container before other?

I need to start backend-container after start database-container. How can I do it with docker-compose?
Use a depends_on clause on your backend-container. Something like that :
version: "3.7"
services:
web:
build: .
depends_on:
- db
db:
image: postgres
Here is the documentation about it.
Have fun!
You should look into the depends_on configuration for docker compose.
In short, you should be able to do something like:
services:
database-container:
# configuration
backend-container:
depends_on:
- database-container
# configuration
The depends_on field will work with docker-compose, but you will find it is not supported if you upgrade to swarm mode. It also guarantees the database container is created, but not necessarily ready to receive connections.
For that, there are several options:
Let the backend container fail and configure a restart policy. This is ugly, leads to false errors being reported, but it's also the easiest to implement.
Perform a connection from your app with a retry loop, a sleep between retries, and a timeout in case the database doesn't start in a timely fashion. This is usually my preferred method, but it requires a change to your application.
Use an entrypoint script with a command like wait-for-it.sh that waits for a remote resource to become available, and once that command succeeds, launch your app. This doesn't cover all the scenarios as a complete client connection, but can be less intrusive to implement since it only requires changes to an entrypoint script rather than the app itself.

Make docker image as base for another image

Now I have built a simple GET API to access this database: https://github.com/ghusta/docker-postgres-world-db
This API will get a country code and get the full record of the country of this country from the database.
The structure is that the API is in a separate docker image, and the database is in another one.
So once the API's image tries to start, I need it to start the database's image before and then start running itself upon the database's image.
So how to do that?
You can use Docker Compose, specifically the depends_on directive. This will cause Docker to start all dependencies before starting an image.
Unfortunately there is no way to make it wait for the dependency to go live before starting any dependents. You'll have to manage that yourself with a wait script or similar.
A most probable solution would be to use docker compose along with a third party script.
For Example your docker compose file might look like:
services:
web:
build: .
ports:
- "80:8000"
depends_on:
- "db"
command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
db:
image: postgres
Where ./wait-for-it.sh is a third party script you can get from
https://github.com/vishnubob/wait-for-it
You can also use this script from
https://github.com/Eficode/wait-for
I would recommend to tweak the script as per your needs if you want to(i did that).
P.S:
The problem of waiting for a database (for example) to be ready is really just a subset of a much larger problem of distributed systems. In production, your database could become unavailable or move hosts at any time. Your application needs to be resilient to these types of failures.
To handle this, design your application to attempt to re-establish a connection to the database after a failure. If the application retries the connection, it can eventually connect to the database.
The best solution is to perform this check in your application code, both at startup and whenever a connection is lost for any reason

Resources