How do you perform a HEALTHCHECK in the Redis Docker image? - docker

Recently, we had an outage due to Redis being unable to write to a file system (not sure why it's Amazon EFS) anyway I noted that there was no actual HEALTHCHECK set up for the Docker service to make sure it is running correctly, Redis is up so I can't simply use nc -z to check if the port is open.
Is there a command I can execute in the redis:6-alpine (or non-alpine) image that I can put in the healthcheck block of the docker-compose.yml file.
Note I am looking for command that is available internally in the image. Not an external healthcheck.

If I remember correctly that image includes redis-cli so, maybe, something along these lines:
...
healthcheck:
test: ["CMD", "redis-cli","ping"]

Although the ping operation from #nitrin0 answer generally works. It does not handle the case where the write operation will actually fail. So instead I perform a change that will just increment a value to a key I don't plan to use.
image: redis:6
healthcheck:
test: [ "CMD", "redis-cli", "--raw", "incr", "ping" ]

I've just noticed that there is a phase in which redis is still starting up and loading data. In this phase, redis-cli ping shows the error
LOADING Redis is loading the dataset in memory
but stills returns the exit code 0, which would make redis already report has healthy.
Also redis-cli --raw incr ping returns 0 in this phase without actually incrementing this key successfully.
As a workaround, I'm checking whether the redis-cli ping actually prints a PONG, which it only does after the LOADING has been finished.
services:
redis:
healthcheck:
test: ["CMD-SHELL", "redis-cli ping | grep PONG"]
interval: 1s
timeout: 3s
retries: 5
This works because grep returns only 0 when the string ("PONG") is found.

You can also add it inside the Dockerfile if your using a Redis image that contains the redis-cli:
Linux Docker
HEALTHCHECK CMD redis-cli ping || exit 1
Windows Docker
HEALTHCHECK CMD pwsh.exe -command \
try { \
$response = ./redis-cli ping; \
if ($response -eq 'PONG') { return 0} else {return 1}; \
} catch { return 1 }

Related

Does docker-compose support init container?

init container is a great feature in Kubernetes and I wonder whether docker-compose supports it? it allows me to run some command before launch the main application.
I come cross this PR https://github.com/docker/compose-cli/issues/1499 which mentions to support init container. But I can't find related doc in their reference.
This was a discovery for me but yes, it is now possible to use init containers with docker-compose since version 1.29 as can be seen in the PR you linked in your question.
Meanwhile, while I write those lines, it seems that this feature has not yet found its way to the documentation
You can define a dependency on an other container with a condition being basically "when that other container has successfully finished its job". This leaves the room to define containers running any kind of script and exit when they are done before an other dependent container is launched.
To illustrate, I crafted an example with a pretty common scenario: spin up a db container, make sure the db is up and initialize its data prior to launching the application container.
Note: initializing the db (at least as far as the official mysql image is concerned) does not require an init container so this example is more an illustration than a rock solid typical workflow.
The complete example is available in a public github repo so I will only show the key points in this answer.
Let's start with the compose file
---
x-common-env: &cenv
MYSQL_ROOT_PASSWORD: totopipobingo
services:
db:
image: mysql:8.0
command: --default-authentication-plugin=mysql_native_password
environment:
<<: *cenv
init-db:
image: mysql:8.0
command: /initproject.sh
environment:
<<: *cenv
volumes:
- ./initproject.sh:/initproject.sh
depends_on:
db:
condition: service_started
my_app:
build:
context: ./php
environment:
<<: *cenv
volumes:
- ./index.php:/var/www/html/index.php
ports:
- 9999:80
depends_on:
init-db:
condition: service_completed_successfully
You can see I define 3 services:
The database which is the first to start
The init container which starts only once db is started. This one only runs a script (see below) that will exit once everything is initialized
The application container which will only start once the init container has successfuly done its job.
The initproject.sh script run by the db-init container is very basic for this demo and simply retries to connect to the db every 2 seconds until it succeeds or reaches a limit of 50 tries, then creates a db/table and insert some data:
#! /usr/bin/env bash
# Test we can access the db container allowing for start
for i in {1..50}; do mysql -u root -p${MYSQL_ROOT_PASSWORD} -h db -e "show databases" && s=0 && break || s=$? && sleep 2; done
if [ ! $s -eq 0 ]; then exit $s; fi
# Init some stuff in db before leaving the floor to the application
mysql -u root -p${MYSQL_ROOT_PASSWORD} -h db -e "create database my_app"
mysql -u root -p${MYSQL_ROOT_PASSWORD} -h db -e "create table my_app.test (id int unsigned not null auto_increment primary key, myval varchar(255) not null)"
mysql -u root -p${MYSQL_ROOT_PASSWORD} -h db -e "insert into my_app.test (myval) values ('toto'), ('pipo'), ('bingo')"
The Dockerfile for the app container is trivial (adding a mysqli driver for php) and can be found in the example repo as well as the php script to test the init was succesful by calling http://localhost:9999 in your browser.
The interesting part is to observe what's going on when launching the service with docker-compose up -d.
The only limit to what can be done with such a feature is probably your imagination ;) Thanks for making me discovering this.

Docker healthcheck causes the container to crash

I have a customized rabbitmq image that I am using with docker-compose (3.7) to launch a docker cluster. This is necessary because of some peculiar issues when trying to deploy a cluster in docker swarm. The image has a shell script which runs on the primary and secondary nodes and makes the modifications needed to run a cluster. This involves stopping rabbitmq and running rabbitmqctl commands to create the cluster between the two nodes. This configuiration works flawlessly until I try to add in a healthcheck. I have tried adding it in to the image and adding it into the compose file. Both cause the image to crash and constantly restart. I have the following shell script which gets copied into the image:
#!/bin/bash
set -eo pipefail
# A RabbitMQ node is considered healthy if all the below are true:
# * the rabbit app finished booting & it's running
# * there are no alarms
# * there is at least 1 active listener
rabbitmqctl eval '
{ true, rabbit_app_booted_and_running } = { rabbit:is_booted(node()), rabbit_app_booted_and_running },
{ [], no_alarms } = { rabbit:alarms(), no_alarms },
[] /= rabbit_networking:active_listeners(),
rabbitmq_node_is_healthy.
' || exit 1
On an already running image this works and produces the correct result.
I tried the flowing in the compose file:
healthcheck:
interval: 60s
timeout: 60s
retries: 10
start_period: 600s
test: ["CMD", "docker-healthcheck"]
It seems that the start_period is completely ignored. I can see the health status with an error right away. I have also tried the following native rabbitmq diagnostics command:
rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q check_local_alarms
This oddly fails with an "unable to find rabbitmq-diagnostics" error, despite the fact the program is definitely in the path. I can execute the command successfully in an already running container.
If I create the container without the healthcheck and then add it in after the fact from the command line with:
docker service update --health-cmd docker-healthcheck --health-interval 60s --health-timeout 60s --health-retries 10 [container id]
it marks the container healthy. So it works but just not in a start up configuration. It seems like to me that the healthcheck should not begin until 10 minutes have passed. It doesn't seem to matter how long I wait for everything to startup using the start_period parameter it still causes the container to fail.
Is this a bug or is there something mysterious about the way start_period works?
Anyone else every have this problem?

Display message success after docker-compose up is done

I run many services in my docker-compse.yml. I'd like to display a message to let user know when docker-compose up is done?
I tried to echo message with command but my container exited with code 0
command: bash -c "echo Congratulation! You can use your containers now"
Is there anyway to let user know when docker-compose up is done?
Many thanks!
Anything you do in CMD will only be visible in the logs, and only per service, so you may as well just watch the logs which I assume is what you're trying to avoid. If you want it to work from the local command line you're going to need to wrap the docker-compose up command.
THE following is NOT fully tested. This is for guidance only. I use the getContainerHealth and waitContainer scripts myself, so can vouch for them. I'm fairly sure I found them on this site, but can't remember where.
Assuming you're waiting for your services to be in a usable state, you could use something similar to this compose file:
docker-compose.yml
services:
serviceOne:
// ... the usual stuff
container_name: serviceOne
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"]
interval: 10s
timeout: 10s
retries: 3
start_period: 1s
serviceTwo:
// ... the usual stuff
container_name: serviceTwo
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"]
interval: 10s
start_period: 1s
The commands in the test are the commands you use to define the service is up and running. More info here: https://docs.docker.com/compose/compose-file/#healthcheck
You will probably want different health checks (or at least intervals, timeouts, start_periods, etc) for production and development.
then some bash script that you use instead of docker-compose up:
getContainerHealth () {
docker inspect --format "{{json .State.Health.Status }}" $1
}
waitContainer () {
while STATUS=$(getContainerHealth $1); [ $STATUS != "\"healthy\"" ]; do
if [ $STATUS = "\"unhealthy\"" ]; then
echo "Failed!"
exit -1
fi
printf .
lf=$'\n'
sleep 1
done
printf "$lf"
}
waitContainers () {
waitContainer serviceOne
waitContainer serviceTwo
// ... for all services
}
docker-compose up
echo "In progress. Please wait"
waitContainers
echo "Done. Ready to roll."
create a shell file echo.sh that contains
echo "Your message"
In Docker file add
CMD ["/dockerRepo/echo.sh"]
Have you tried running something like the following directly in the Dockerfile itself ?
RUN echo -e "Your message"

HEALTHCHECK of a Docker container running Celery tasks?

I know one of the ways to check health for Docker container is using the commmand
HEALTHCHECK CMD curl --fail http://localhost:3000/ || exit 1
But in case of workers there is no such URL to hit , How to check the container's health in that case ?
The celery inspect ping command comes in handy, as it does a whole trip: it sends a "ping" task on the broker, workers respond and celery fetches the responses.
Assuming your app is named tasks.add, you may ping all your workers:
/app $ celery inspect ping -A tasks.add
-> celery#aa7c21dd0e96: OK
pong
-> celery#57615db15d80: OK
pong
With aa7c21dd0e96 being the Docker hostname, and thus available in $HOSTNAME.
To ping a single node, you would have to run:
celery inspect ping -A tasks.add -d celery#$HOSTNAME
Here, d stands for destination.
The line to add to your Dockerfile:
HEALTHCHECK CMD celery inspect ping -A tasks.add -d celery#$HOSTNAME
Sample outputs:
/app $ celery inspect ping -A tasks.add -d fake_node
Error: No nodes replied within time constraint.
/app $ echo $?
69
Unhealthy if the node does not exist or does not reply
/app $ celery inspect ping -A tasks.add -d celery#$HOSTNAME
-> celery#d39b3d31cc13: OK
pong
/app $ echo $?
0
Healthy when the node replies pong.
/app $ celery inspect ping -d celery#$HOSTNAME
Traceback (most recent call last):
...
raise socket.error(last_err)
OSError: [Errno 111] Connection refused
/app $ echo $?
1
Unhealthy when the broker is not available - I removed the app, so it tries to connect to a local AMPQ and fails
This might not suit your needs, the broker is unhealthy, not the worker.
The below example snippet, derived from that posted by #PunKeel, is applicable for those looking to implement health check in docker-compose.yml which could be used through docker-compose or docker stack deploy.
worker:
build:
context: .
dockerfile: Dockerfile
image: myimage
links:
- rabbitmq
restart: always
command: celery worker --hostname=%h --broker=amqp://rabbitmq:5672
healthcheck:
test: celery -b amqp://rabbitmq:5672 inspect ping -d celery#$$HOSTNAME
interval: 30s
timeout: 10s
retries: 3
Notice the extra $ in the command, so that $HOSTNAME actually gets passed into the container. I also didn't use the -A flag.
Ideally, rabbitmq should also have its own health check, perhaps with curl guest:guest#localhost:15672/api/overview, since docker wouldn't be able to discern if worker is down or the broker is down with celery inspect ping.
For celery 5.2.3 I used celery -A [celery app name] status for the health check. This is how my docker-compose file looks like
worker:
build: .
healthcheck:
test: celery -A app.celery_app status
interval: 10s
timeout: 10s
retries: 10
volumes:
- ./app:/app
depends_on:
- broker
- redis
- database
Landed on this question looking for a health check for Celery workers as part of an Airflow setup (Airflow 2.3.4, Celery 5.2.7), which I eventually figured out. This is a very specific use case of the original question, but might still be useful for some:
# docker-compose.yml
worker:
image: ...
hostname: local-worker
entrypoint: airflow celery worker
...
healthcheck:
test: [ "CMD-SHELL", 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"' ]
interval: 5s
timeout: 10s
retries: 10
restart: always
...
I got inspiration from Airflow's quick-start Docker Compose.

docker with ansible wait for database

I try to deploy docker with ansible. I have one docker database container, and in other container is my web app, and I try to link this two container. The problem is that database container didn't have a time to configure itself and a web container is already started. My ansible playbook look something like:
...
- name: run mysql in docker container
docker:
image: "mysql:5.5"
name: database
env: "MYSQL_ROOT_PASSWORD=password"
state: running
- name: run application containers
docker:
name: "application"
image: "myapp"
ports:
- "8080:8080"
links:
- "database:db"
state: running
How to determine if database is start? I try with wait_for module, but that didn't work. I don't want to set timeout, it's not good option for me.
wait_for does not work for the MySQL docker container because it only checks that the port is connectable (which is true straight away for the Docker container). However, wait_for does not check that the service inside the container listens the port and sends responses to the client.
This is how I am waiting in the ansible playbook for the MySQL service becoming fully operational inside the Docker container:
- name: Start MySQL container
docker:
name: some-name
image: mysql:latest
state: started
ports:
- "8306:3306" # it's important to expose the port for waiting requests
env:
MYSQL_ROOT_PASSWORD: "{{ mysql_root_password }}"
- template: mode="a+rx,o=rwx" src=telnet.sh.j2 dest=/home/ubuntu/telnet.sh
# wait while MySQL is starting
- action: shell /home/ubuntu/telnet.sh
register: result
until: result.stdout.find("mysql_native_password") != -1
retries: 10
delay: 3
And the telnet.sh.j2 is
#!/bin/bash -e
telnet localhost 8306 || true
To avoid the sh and I don't normally have telnet installed...
- name: Wait for database to be available
shell: docker run --rm --link mysql:mysql mysql sh -c 'mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p{{mysql_password}} || true'
register: result
until: result.stderr.find("Can't connect to MySQL") == -1
retries: 10
delay: 3
As etrubenok said:
wait_for does not work for the MySQL docker container because it only checks that the port is connectable (which is true straight away for the Docker container). However, wait_for does not check that the service inside the container listens the port and sends responses to the client.
Using Andy Shinn's suggestion of FreshPow's answer, you can wait without needing a shell script or telnet:
- name: Wait for mariadb
command: >
docker exec {{ container|quote }}
mysqladmin ping -u{{ superuser|quote }} -p{{ superuser_password|quote }}
register: result
until: not result.rc # or result.rc == 0 if you prefer
retries: 20
delay: 3
This runs mysqladmin ping ... until it succeeds (return code 0). Usually superuser is root. I tested using podman instead of docker but I believe the command is the same regardless. |quote does shell escaping, which according to the Ansible docs should also be done when using command:
This works for me just fine:
- name: get mariadb IP address
command: "docker inspect --format '{''{ .NetworkSettings.IPAddress }''}' mariadb-container"
register: mariadb_ip_address
- name: wait for mariadb to become ready
wait_for:
host: "{{ mariadb_ip_address.stdout }}"
port: 3306
state: started
delay: 5
connect_timeout: 15
timeout: 30
Use wait_for module. I'm no expert on MySQL but I assume there would be some port or existence of file or message in some log file etc. you can check to find out if the DB is up or not.
Here are examples of wait_for copied from the link above.
# wait 300 seconds for port 8000 to become open on the host, don't start checking for 10 seconds
- wait_for: port=8000 delay=10
# wait 300 seconds for port 8000 of any IP to close active connections, don't start checking for 10 seconds
- wait_for: host=0.0.0.0 port=8000 delay=10 state=drained
# wait 300 seconds for port 8000 of any IP to close active connections, ignoring connections for specified hosts
- wait_for: host=0.0.0.0 port=8000 state=drained exclude_hosts=10.2.1.2,10.2.1.3
# wait until the file /tmp/foo is present before continuing
- wait_for: path=/tmp/foo
# wait until the string "completed" is in the file /tmp/foo before continuing
- wait_for: path=/tmp/foo search_regex=completed
# wait until the lock file is removed
- wait_for: path=/var/lock/file.lock state=absent
# wait until the process is finished and pid was destroyed
- wait_for: path=/proc/3466/status state=absent
# wait 300 seconds for port 22 to become open and contain "OpenSSH", don't assume the inventory_hostname is resolvable
# and don't start checking for 10 seconds
- local_action: wait_for port=22 host="{{ ansible_ssh_host | default(inventory_hostname) }}" search_regex=OpenSSH delay=10
I was able to use wait_for like this:
- name: "MySQL - Check mysql - Wait for mysql to be up"
wait_for:
host: 127.0.0.1
port: 3306
search_regex: "(mysql_native_password|caching_sha2_password)"
This way it will wait for the port o be up and for the service to send some data.
The drawback is that the output may change with mysql versions and configurations. In the example are the strings for mysql 5.5 and 8.0. Adjust for your use cases.
An alternative, avoiding running wait_for, command or shell, may be to retry some mysql command several times until it succedes:
- name: "MySQL - Check mysql - if it responds"
mysql_info:
login_user: root
login_password: "{{ mysql_root_password }}"
filter:
- version
register: mysql_result
until: mysql_result is not failed
retries: 5
delay: 10

Resources