I know one of the ways to check health for Docker container is using the commmand
HEALTHCHECK CMD curl --fail http://localhost:3000/ || exit 1
But in case of workers there is no such URL to hit , How to check the container's health in that case ?
The celery inspect ping command comes in handy, as it does a whole trip: it sends a "ping" task on the broker, workers respond and celery fetches the responses.
Assuming your app is named tasks.add, you may ping all your workers:
/app $ celery inspect ping -A tasks.add
-> celery#aa7c21dd0e96: OK
pong
-> celery#57615db15d80: OK
pong
With aa7c21dd0e96 being the Docker hostname, and thus available in $HOSTNAME.
To ping a single node, you would have to run:
celery inspect ping -A tasks.add -d celery#$HOSTNAME
Here, d stands for destination.
The line to add to your Dockerfile:
HEALTHCHECK CMD celery inspect ping -A tasks.add -d celery#$HOSTNAME
Sample outputs:
/app $ celery inspect ping -A tasks.add -d fake_node
Error: No nodes replied within time constraint.
/app $ echo $?
69
Unhealthy if the node does not exist or does not reply
/app $ celery inspect ping -A tasks.add -d celery#$HOSTNAME
-> celery#d39b3d31cc13: OK
pong
/app $ echo $?
0
Healthy when the node replies pong.
/app $ celery inspect ping -d celery#$HOSTNAME
Traceback (most recent call last):
...
raise socket.error(last_err)
OSError: [Errno 111] Connection refused
/app $ echo $?
1
Unhealthy when the broker is not available - I removed the app, so it tries to connect to a local AMPQ and fails
This might not suit your needs, the broker is unhealthy, not the worker.
The below example snippet, derived from that posted by #PunKeel, is applicable for those looking to implement health check in docker-compose.yml which could be used through docker-compose or docker stack deploy.
worker:
build:
context: .
dockerfile: Dockerfile
image: myimage
links:
- rabbitmq
restart: always
command: celery worker --hostname=%h --broker=amqp://rabbitmq:5672
healthcheck:
test: celery -b amqp://rabbitmq:5672 inspect ping -d celery#$$HOSTNAME
interval: 30s
timeout: 10s
retries: 3
Notice the extra $ in the command, so that $HOSTNAME actually gets passed into the container. I also didn't use the -A flag.
Ideally, rabbitmq should also have its own health check, perhaps with curl guest:guest#localhost:15672/api/overview, since docker wouldn't be able to discern if worker is down or the broker is down with celery inspect ping.
For celery 5.2.3 I used celery -A [celery app name] status for the health check. This is how my docker-compose file looks like
worker:
build: .
healthcheck:
test: celery -A app.celery_app status
interval: 10s
timeout: 10s
retries: 10
volumes:
- ./app:/app
depends_on:
- broker
- redis
- database
Landed on this question looking for a health check for Celery workers as part of an Airflow setup (Airflow 2.3.4, Celery 5.2.7), which I eventually figured out. This is a very specific use case of the original question, but might still be useful for some:
# docker-compose.yml
worker:
image: ...
hostname: local-worker
entrypoint: airflow celery worker
...
healthcheck:
test: [ "CMD-SHELL", 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"' ]
interval: 5s
timeout: 10s
retries: 10
restart: always
...
I got inspiration from Airflow's quick-start Docker Compose.
Related
Recently, we had an outage due to Redis being unable to write to a file system (not sure why it's Amazon EFS) anyway I noted that there was no actual HEALTHCHECK set up for the Docker service to make sure it is running correctly, Redis is up so I can't simply use nc -z to check if the port is open.
Is there a command I can execute in the redis:6-alpine (or non-alpine) image that I can put in the healthcheck block of the docker-compose.yml file.
Note I am looking for command that is available internally in the image. Not an external healthcheck.
If I remember correctly that image includes redis-cli so, maybe, something along these lines:
...
healthcheck:
test: ["CMD", "redis-cli","ping"]
Although the ping operation from #nitrin0 answer generally works. It does not handle the case where the write operation will actually fail. So instead I perform a change that will just increment a value to a key I don't plan to use.
image: redis:6
healthcheck:
test: [ "CMD", "redis-cli", "--raw", "incr", "ping" ]
I've just noticed that there is a phase in which redis is still starting up and loading data. In this phase, redis-cli ping shows the error
LOADING Redis is loading the dataset in memory
but stills returns the exit code 0, which would make redis already report has healthy.
Also redis-cli --raw incr ping returns 0 in this phase without actually incrementing this key successfully.
As a workaround, I'm checking whether the redis-cli ping actually prints a PONG, which it only does after the LOADING has been finished.
services:
redis:
healthcheck:
test: ["CMD-SHELL", "redis-cli ping | grep PONG"]
interval: 1s
timeout: 3s
retries: 5
This works because grep returns only 0 when the string ("PONG") is found.
You can also add it inside the Dockerfile if your using a Redis image that contains the redis-cli:
Linux Docker
HEALTHCHECK CMD redis-cli ping || exit 1
Windows Docker
HEALTHCHECK CMD pwsh.exe -command \
try { \
$response = ./redis-cli ping; \
if ($response -eq 'PONG') { return 0} else {return 1}; \
} catch { return 1 }
I have a docker-compose file for starting a Terraria server, but after starting the server, I can't input any commands. If I start the server directly in my shell, I am able to input commands. How can I get the same result in docker as if I had run the command myself in a shell?
This is the desired behavior, which is what happens when I run it from my shell:
$ TerrariaServerVolume/TerrariaServer -pass xxx -port 7777 -world ~/absolute/path/TerrariaWorldsVolume/testWorldName.wld
Terraria Server v1.4.2.2
Listening on port 7777
Type 'help' for a list of commands.
: Server started
help // my input
Available commands:
... //list of commands
: % //I pressed Ctrl+c
$
This is what actually happens in my docker container:
$ sudo docker-compose up
Terraria Server v1.4.2.2
TerrariaServer_1 |
TerrariaServer_1 | Listening on port 7777
TerrariaServer_1 | Type 'help' for a list of commands.
TerrariaServer_1 |
TerrariaServer_1 | : Server started
^[[6;23
I don't know what ^[[6;23 is, but then here's me trying to input commands:
...
TerrariaServer_1 | : Server started
^[[6;23Rhelp
help
exit
stop
ljadgkljasdgl
^CGracefully stopping... (press Ctrl+C again to force)
Stopping terraria_TerrariaServer_1 ... done
$
This is my setup:
docker-compose.yml
version: "3"
services:
TerrariaServer:
image: "mono:6.8.0.96-slim"
ports:
- 7777:7777
expose:
- 7777
volumes:
- "./TerrariaServerVolume:/Terraria/Server"
- "./TerrariaWorldsVolume:/Terraria/Worlds"
environment:
- WorldName=testWorldName.wld
command: bash -c "/Terraria/Server/TerrariaServer -pass <password> -port 7777 -world /Terraria/Worlds/$WorldName"
stdin_open: true
tty: true
To type other commands after running docker-compose You need to use -d parameter.
Example:
docker-compose up -d
From docs:
-d, --detach Detached mode: Run containers in the background, print new container names.
My custom docker-compose command won't stop gracefully. Why isn't the below working and how can I fix it properly (i.e. no SIGKILL)?
This is mentioned in the docs: https://docs.docker.com/compose/faq
Similar questions (none have helped - see test below):
Docker: gracefully stop django server
How to gracefully stop a Dockerized Python ROS2 node when run with docker-compose up?
Gunicorn graceful stopping with docker-compose
I've written these tests to demonstrate:
version: '3.3'
services:
# My personal use case
test1_string:
image: python:3.9.0rc1-alpine3.12
command: 'python -u -m smtpd -n -c DebuggingServer 0.0.0.0:1025'
# https://docs.docker.com/compose/faq/
# "Compose always uses the JSON form, so don’t worry if you override the command or entrypoint in your Compose file."
test2_list:
image: python:3.9.0rc1-alpine3.12
command: ['python', '-u', '-m', 'smtpd', '-n', '-c', 'DebuggingServer', '0.0.0.0:1025']
# Maybe it's an issue with networking and syscalls?
# Using nc once for a comparison to smtpd DebuggingServer above
test3_bash:
image: bash:5.0.18
command: ['bash', '-c', 'nc -k -l 4444 > filename.out']
# Something simpler, just a sleep.
test4_bash_sleep:
image: bash:5.0.18
command: ['bash', '-c', 'sleep 100']
# Apparently bash doesn't forward signals, but exec can help (?)
test5_bash_exec:
image: bash:5.0.18
command: ['bash', '-c', 'exec sleep 100']
# Print any signals sent to the executable
test6_bash_trap:
image: bash:5.0.18
command: ['bash', '-c', 'for s in HUP INT TERM KILL EXIT ; do trap "echo $$s; exit 0" $$s ; done ; sleep 100']
# Finally, use SIGKILL instead of SIGTERM. This works, but is overkill and not at all graceful.
test7_kill:
image: python:3.9.0rc1-alpine3.12
command: ['python', '-u', '-m', 'smtpd', '-n', '-c', 'DebuggingServer', '0.0.0.0:1025']
stop_signal: SIGKILL
This is the output, hitting Ctrl-C after theyre up:
$ docker-compose --version
docker-compose version 1.25.4, build unknown
$ docker --version
Docker version 19.03.11, build 42e35e6
$ docker-compose up
Creating docker_stop_test2_list_1 ... done
Creating docker_stop_test1_string_1 ... done
Creating docker_stop_test6_bash_trap_1 ... done
Creating docker_stop_test4_bash_sleep_1 ... done
Creating docker_stop_test7_kill_1 ... done
Creating docker_stop_test5_bash_exec_1 ... done
Creating docker_stop_test3_bash_1 ... done
Attaching to docker_stop_test4_bash_sleep_1, docker_stop_test5_bash_exec_1, docker_stop_test2_list_1, docker_stop_test3_bash_1, docker_stop_test6_bash_trap_1, docker_stop_test7_kill_1, docker_stop_test1_string_1
^CGracefully stopping... (press Ctrl+C again to force)
Stopping docker_stop_test5_bash_exec_1 ...
Stopping docker_stop_test3_bash_1 ...
Stopping docker_stop_test6_bash_trap_1 ...
Stopping docker_stop_test7_kill_1 ... done
Stopping docker_stop_test1_string_1 ...
Stopping docker_stop_test2_list_1 ...
Stopping docker_stop_test4_bash_sleep_1 ...
test7_kill is the only one that stops but it's using SIGKILL. How can I make any of the others work?
I am trying to use Consul as discovery service, and another two spring boot app to register with Consul; and put them into docker;
following are my codes:
app:
server:
port: 3333
spring:
application:
name: adder
cloud:
consul:
host: consul
port: 8500
discovery:
preferIpAddress: true
healthCheckPath: /health
healthCheckInterval: 15s
instanceId: ${spring.application.name}:${spring.application.instance_id:${server.port}}
2 docker-compose.yml
consul1:
image: "progrium/consul:latest"
container_name: "consul1"
hostname: "consul1"
command: "-server -bootstrap -ui-dir /ui"
adder:
image: wsy/adder
ports:
- "3333:3333"
links:
- consul1
environment:
WAIT_FOR_HOSTS: consul1:8500
There is another similar question Cannot link Consul and Spring Boot app in Docker;
the answer suggests, the app should wait for consul to fully work by using depends_on, which I tried, but didn't work;
the error message is as following:
adder_1 | com.ecwid.consul.transport.TransportException: java.net.ConnectException: Connection refused
adder_1 | at com.ecwid.consul.transport.AbstractHttpTransport.executeRequest(AbstractHttpTransport.java:80) ~[consul-api-1.1.8.jar!/:na]
adder_1 | at com.ecwid.consul.transport.AbstractHttpTransport.makeGetRequest(AbstractHttpTransport.java:39) ~[consul-api-1.1.8.jar!/:na]
besides spring boot application.yml and docker-compose.yml, following is App's Dockerfile
FROM java:8
VOLUME /tmp
ADD adder-0.0.1-SNAPSHOT.jar app.jar
RUN bash -c 'touch /app.jar'
ADD start.sh start.sh
RUN bash -c 'chmod +x /start.sh'
EXPOSE 3333
ENTRYPOINT ["/start.sh", " java -Djava.security.egd=file:/dev/./urandom -jar /app.jar"]
and the start.sh
#!/bin/bash
set -e
wait_single_host() {
local host=$1
shift
local port=$1
shift
echo "waiting for TCP connection to $host:$port..."
while ! nc ${host} ${port} > /dev/null 2>&1 < /dev/null
do
echo "TCP connection [$host] not ready, will try again..."
sleep 1
done
echo "TCP connection ready. Executing command [$host] now..."
}
wait_all_hosts() {
if [ ! -z "$WAIT_FOR_HOSTS" ]; then
local separator=':'
for _HOST in $WAIT_FOR_HOSTS ; do
IFS="${separator}" read -ra _HOST_PARTS <<< "$_HOST"
wait_single_host "${_HOST_PARTS[0]}" "${_HOST_PARTS[1]}"
done
else
echo "IMPORTANT : Waiting for nothing because no $WAIT_FOR_HOSTS env var defined !!!"
fi
}
wait_all_hosts
exec $1
I can infer that your Consul configuration is located in your application.yml instead of bootstrap.yml, that's the problem.
According to this answer, bootstrap.yml is loaded before application.yml and Consul client has to check its configuration before the application itself and therefore look at the bootstrap.yml.
Example of a working bootstrap.yml :
spring:
cloud:
consul:
host: consul
port: 8500
discovery:
prefer-ip-address: true
Run Consul server and do not forget the name option to match with your configuration:
docker run -d -p 8500:8500 --name=consul progrium/consul -server -bootstrap
Consul server is now running, run your application image (builded previously with your artifact) and link it to the Consul container:
docker run -d -name=my-consul-client-app --link consul:consul acme/spring-app
Your problem is that depends_on does only control the startup order of your services. You have to wait until the consul servers are up and running before starting your spring app. You can do this with this script:
#!/bin/bash
set -e
default_host="database"
default_port="3306"
host="${2:-$default_host}"
port="${3:-$default_port}"
echo "waiting for TCP connection to $host:$port..."
while ! (echo >/dev/tcp/$host/$port) &>/dev/null
do
sleep 1
done
echo "TCP connection ready. Executing command [$1] now..."
exec $1
Usage in you docker file:
COPY wait.sh /wait.sh
RUN chmod +x /wait.sh
CMD ["/wait.sh", "java -jar yourApp-jar" , "consulURL" , "ConsulPort" ]
I just want to clarify that, at last I still don't have a solution, and can't understand the situation here; I tried the suggestion from Ohmen, in APP container, I am able to ping consul1; But the APP still fails to connect consul;
If I only start the consul by following command:
docker-compose -f docker-compose-discovery.yml up discovery
Then I can run the APP directly(through Main), and it is able to connect with spring.cloud.consul.host: discovery;
But if I try to run APP in docker container, like following:
docker run --name adder --link discovery-consul:discovery wsy/adder
It fails again with connection refused;
I am very new to docker & docker-compose; I thought it would be a good example to start, but it seems not that easy for me;
I try to deploy docker with ansible. I have one docker database container, and in other container is my web app, and I try to link this two container. The problem is that database container didn't have a time to configure itself and a web container is already started. My ansible playbook look something like:
...
- name: run mysql in docker container
docker:
image: "mysql:5.5"
name: database
env: "MYSQL_ROOT_PASSWORD=password"
state: running
- name: run application containers
docker:
name: "application"
image: "myapp"
ports:
- "8080:8080"
links:
- "database:db"
state: running
How to determine if database is start? I try with wait_for module, but that didn't work. I don't want to set timeout, it's not good option for me.
wait_for does not work for the MySQL docker container because it only checks that the port is connectable (which is true straight away for the Docker container). However, wait_for does not check that the service inside the container listens the port and sends responses to the client.
This is how I am waiting in the ansible playbook for the MySQL service becoming fully operational inside the Docker container:
- name: Start MySQL container
docker:
name: some-name
image: mysql:latest
state: started
ports:
- "8306:3306" # it's important to expose the port for waiting requests
env:
MYSQL_ROOT_PASSWORD: "{{ mysql_root_password }}"
- template: mode="a+rx,o=rwx" src=telnet.sh.j2 dest=/home/ubuntu/telnet.sh
# wait while MySQL is starting
- action: shell /home/ubuntu/telnet.sh
register: result
until: result.stdout.find("mysql_native_password") != -1
retries: 10
delay: 3
And the telnet.sh.j2 is
#!/bin/bash -e
telnet localhost 8306 || true
To avoid the sh and I don't normally have telnet installed...
- name: Wait for database to be available
shell: docker run --rm --link mysql:mysql mysql sh -c 'mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p{{mysql_password}} || true'
register: result
until: result.stderr.find("Can't connect to MySQL") == -1
retries: 10
delay: 3
As etrubenok said:
wait_for does not work for the MySQL docker container because it only checks that the port is connectable (which is true straight away for the Docker container). However, wait_for does not check that the service inside the container listens the port and sends responses to the client.
Using Andy Shinn's suggestion of FreshPow's answer, you can wait without needing a shell script or telnet:
- name: Wait for mariadb
command: >
docker exec {{ container|quote }}
mysqladmin ping -u{{ superuser|quote }} -p{{ superuser_password|quote }}
register: result
until: not result.rc # or result.rc == 0 if you prefer
retries: 20
delay: 3
This runs mysqladmin ping ... until it succeeds (return code 0). Usually superuser is root. I tested using podman instead of docker but I believe the command is the same regardless. |quote does shell escaping, which according to the Ansible docs should also be done when using command:
This works for me just fine:
- name: get mariadb IP address
command: "docker inspect --format '{''{ .NetworkSettings.IPAddress }''}' mariadb-container"
register: mariadb_ip_address
- name: wait for mariadb to become ready
wait_for:
host: "{{ mariadb_ip_address.stdout }}"
port: 3306
state: started
delay: 5
connect_timeout: 15
timeout: 30
Use wait_for module. I'm no expert on MySQL but I assume there would be some port or existence of file or message in some log file etc. you can check to find out if the DB is up or not.
Here are examples of wait_for copied from the link above.
# wait 300 seconds for port 8000 to become open on the host, don't start checking for 10 seconds
- wait_for: port=8000 delay=10
# wait 300 seconds for port 8000 of any IP to close active connections, don't start checking for 10 seconds
- wait_for: host=0.0.0.0 port=8000 delay=10 state=drained
# wait 300 seconds for port 8000 of any IP to close active connections, ignoring connections for specified hosts
- wait_for: host=0.0.0.0 port=8000 state=drained exclude_hosts=10.2.1.2,10.2.1.3
# wait until the file /tmp/foo is present before continuing
- wait_for: path=/tmp/foo
# wait until the string "completed" is in the file /tmp/foo before continuing
- wait_for: path=/tmp/foo search_regex=completed
# wait until the lock file is removed
- wait_for: path=/var/lock/file.lock state=absent
# wait until the process is finished and pid was destroyed
- wait_for: path=/proc/3466/status state=absent
# wait 300 seconds for port 22 to become open and contain "OpenSSH", don't assume the inventory_hostname is resolvable
# and don't start checking for 10 seconds
- local_action: wait_for port=22 host="{{ ansible_ssh_host | default(inventory_hostname) }}" search_regex=OpenSSH delay=10
I was able to use wait_for like this:
- name: "MySQL - Check mysql - Wait for mysql to be up"
wait_for:
host: 127.0.0.1
port: 3306
search_regex: "(mysql_native_password|caching_sha2_password)"
This way it will wait for the port o be up and for the service to send some data.
The drawback is that the output may change with mysql versions and configurations. In the example are the strings for mysql 5.5 and 8.0. Adjust for your use cases.
An alternative, avoiding running wait_for, command or shell, may be to retry some mysql command several times until it succedes:
- name: "MySQL - Check mysql - if it responds"
mysql_info:
login_user: root
login_password: "{{ mysql_root_password }}"
filter:
- version
register: mysql_result
until: mysql_result is not failed
retries: 5
delay: 10