I have a project using the official nginx docker container from Docker Hub, launching via Docker Compose. I have healthchecks configured in Docker Compose for each of my containers, and recently the healthcheck for this nginx container has been behaving strangely; on launching with docker-compose up -d, all my containers launch, and begin running healthchecks, but the nginx container looks like it never runs the healthcheck. I can manually run the script just fine if I docker exec into the container, and the healthcheck runs normally if I restart the container.
Example output from docker ps:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
458a55ae8971 my_custom_image "/tini -- /usr/local…" 7 minutes ago Up 7 minutes (healthy) project_worker_1
5024781b1a73 redis:3.2 "docker-entrypoint.s…" 7 minutes ago Up 7 minutes (healthy) 127.0.0.1:6379->6379/tcp project_redis_1
bd405dde8ce7 postgres:9.6 "docker-entrypoint.s…" 7 minutes ago Up 7 minutes (healthy) 127.0.0.1:15432->5432/tcp project_postgres_1
93e15c18d879 nginx:mainline "nginx -g 'daemon of…" 7 minutes ago Up 7 minutes (health: starting) 127.0.0.1:80->80/tcp, 127.0.0.1:443->443/tcp nginx
Example (partial, for brevity) output from docker inspect nginx:
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 11568,
"ExitCode": 0,
"Error": "",
"StartedAt": "2018-02-13T21:04:22.904241169Z",
"FinishedAt": "0001-01-01T00:00:00Z",
"Health": {
"Status": "unhealthy",
"FailingStreak": 0,
"Log": []
}
},
The portion of the docker-compose.yml defining the nginx container:
nginx:
image: nginx:mainline
# using container_name means there will only ever be one nginx container!
container_name: nginx
restart: always
networks:
- proxynet
volumes:
- /etc/nginx/conf.d
- /etc/nginx/vhost.d
- /usr/share/nginx/html
- tlsdata:/etc/nginx/certs:ro
- attachdata:/usr/share/nginx/html/uploads:ro
- staticdata:/usr/share/nginx/html/static:ro
- ./nginx/healthcheck.sh:/bin/healthcheck.sh
healthcheck:
test: ['CMD', '/bin/healthcheck.sh']
interval: 1m
timeout: 5s
retries: 3
ports:
# Make the http/https ports available on the Docker host IPv4 loopback interface
- '127.0.0.1:80:80'
- '127.0.0.1:443:443'
The healthcheck.sh I am loading in as a volume:
#!/bin/bash
service nginx status || exit 1
It looks like the problem is just an issue with systemd never returning from the status check when the container initially launches, and at the same time the configured healthcheck timeout does not trigger. Everything else works, and nginx is up and responding, but it would be nice for the healthcheck to function properly without needing to manually restart each time I start up.
Is there something missing in my configuration, or a better check I can run?
I think that there is no need for a custom script in this case.
Try just change your healthcheck test to
test: ["CMD", "service", "nginx", "status"]
That works fine for me.
Try to use " instead of ' as well, just in case :)
EDIT
If you really want to force an exit 1, in case of failure, you could use:
test: service nginx status || exit 1
for the official alpine nginx image you can also do:
healthcheck:
test: ["CMD-SHELL", "wget -O /dev/null http://localhost || exit 1"]
timeout: 10s
wget is part of the standard image. What this does is download your index.html/php/whatever to nowhere (/dev/null), and it should timeout and fail otherwise.
I attempted the same script and encountered the same issue. I changed the healthcheck.sh to instead run like this:
#!/bin/bash
if service nginx status; then
exit 0
else
exit 1
fi
Running this in the docker container resulted in successful health checks.
Over a year later, I have found a solution. First, an additional clarification on the environment, what I believe is happening, and speculation on a possible bug with the Docker Engine.
The Compose file I am using now is launching a lightly modified version of the 'official' Alpine NGINX image, which uses COPY to load in the healthcheck script and adds HEALTHCHECK explicitly in the image. This image is used for an nginx service, and is used in concert with an image running jwilder/docker-gen to use container metadata from Docker to generate NGINX configuration files. This container is running as a service named nginx-gen. When containers change, configuration is re-generated, and if there are any changes, a SIGHUP is sent to the nginx service.
What I discovered is the following:
If all services are launched together, the nginx service never runs healthchecks;
If the nginx service is restarted soon after launch, healthchecks complete normally;
If the nginx service is launched by itself, healthchecks complete normally;
If all services other than nginx-gen are launched together, healthchecks complete normally;
If all services are launched together, but nginx-gen is modified to sleep 60 before doing anything, healthchecks complete normally;
So, it appears that there is some obscure interaction with signal processing, Docker, and NGINX. If a SIGHUP is sent to an NGINX process in a container before the first healthcheck runs in that container, no healthchecks ever run.
The final iteration I came up with modifies the nginx-gen container to poll the health of the nginx container. It looks up the health status of a container with a defined label in a loop, with a short sleep. Once the nginx container reports healthy, nginx-gen proceeds to generate configuration files. I also changed the notification method to docker exec a script to explicitly test and reload configuration in the nginx container, rather than rely on SIGHUP.
End result: I can docker-compose up -d, and everything eventually reports healthy without further intervention. Success!
Related
I have created a container running apache2 http server, loaded my certificates and https://mydomain works, however http://mydomain works too, and if I digit on my browser mydomain the browser open http://mydomain. Is there a way to disable http protocol? I use only -p 443:443 while starting the container.
This is my Dockerfile
ARG version=2.4.48-alpine
FROM httpd:$version
LABEL version=1.0
COPY ./public_html/ /usr/local/apache2/htdocs/
# run web traffic over SSL/HTTPS
COPY ./cert/srv.crt /usr/local/apache2/conf/
COPY ./cert/srv.key /usr/local/apache2/conf/
RUN ["sed", "-i", "-e", "'s/^#\(Include .*httpd-ssl.conf\)/\1/'", "-e", "'s/^#\(LoadModule .*mod_ssl.so\)/\1/'", "-e", "'s/^#\(LoadModule .*mod_socache_shmcb.so\)/\1/'", "conf/httpd.conf"]
EXPOSE 443/tcp
and this is the outpuf of docker ps:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
21678d6321e4 webserver "/bin/sh" 2 hours ago Up About an hour 80/tcp, 0.0.0.0:443->443/tcp webserver
I resolved the issue redirecting http to https exposing also the 80 port.
I'm trying to run Cypress tests against containerized Nginx:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7c3efd24e6e6 tdd_nginx "/docker-entrypoint.…" 19 minutes ago Up 19 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp tdd_nginx_1
from official docs I learned I can use docker run -it -v $PWD:/e2e -w /e2e -e CYPRESS_baseUrl=host.docker.internal cypress/included:7.7.0
Here I learned about host.docker.internal which is how supposedly Cypress knows to look for localhost in a particular container.
Nginx container has exposed port 80 so I've tried -e CYPRESS_baseUrl=host.docker.internal:80 as well as without specifying port as port 80 is a fallback port in most cases.
error output:
Cypress could not verify that this server is running:
> http://host.docker.internal:80
We are verifying this server because it has been configured as your `baseUrl`.
Cypress automatically waits until your server is accessible before running tests.
We will try connecting to it 3 more times...
We will try connecting to it 2 more times...
We will try connecting to it 1 more time...
Cypress failed to verify that your server is running.
Please start this server and then run Cypress again.
Moving the env variable into cypress.json made no difference:
{
"baseUrl": "host.docker.internal",
"video": false
}
Changed the cypress.json to:
{
"CYPRESS_BASE_URL": "host.docker.internal",
"video": false
}
parsing CYPRESS_BASE_URL as env variable didn't help but putting it into the file did the trick. Strangely, it makes difference.
Thanks goes to #jonrsharpe
I am using the official docker-compose file of airflow to spin it up.
Some of my containers seem unhealthy:
34d8698d67e7 apache/airflow:2.0.2 "/usr/bin/dumb-init …" 31 minutes ago Up 28 minutes (unhealthy) 0.0.0.0:5555->5555/tcp, :::5555->5555/tcp, 8080/tcp airflow_flower_1
a291cf238b9f apache/airflow:2.0.2 "/usr/bin/dumb-init …" 31 minutes ago Up 29 minutes 8080/tcp airflow_airflow-init_1
fdb20e9152f3 apache/airflow:2.0.2 "/usr/bin/dumb-init …" 31 minutes ago Up 29 minutes (unhealthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp airflow_airflow-webserver_1
abf5a16aa846 apache/airflow:2.0.2 "/usr/bin/dumb-init …" 31 minutes ago Up 29 minutes 8080/tcp airflow_airflow-worker_1
f6dc352f407b apache/airflow:2.0.2 "/usr/bin/dumb-init …" 31 minutes ago Up 28 minutes 8080/tcp airflow_airflow-scheduler_1
12dfc71e518f redis:latest "docker-entrypoint.s…" 31 minutes ago Up 29 minutes (healthy) 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp airflow_redis_1
However the logs of one of them for example do not seem very informative.
# docker logs -f fdb20e9152f3
WARNING! You should run the image with GID (Group ID) set to 0
even if you use 'airflow' user (UID=50000)
You started the image with UID=50000 and GID=50000
This is to make sure you can run the image with an arbitrary UID in the future.
See more about it in the Airflow's docker image documentation
http://airflow.apache.org/docs/docker-stack/entrypoint
BACKEND=postgresql+psycopg2
DB_HOST=my-db-endpoint
DB_PORT=5432
WARNING! You should run the image with GID (Group ID) set to 0
even if you use 'airflow' user (UID=50000)
You started the image with UID=50000 and GID=50000
This is to make sure you can run the image with an arbitrary UID in the future.
See more about it in the Airflow's docker image documentation
http://airflow.apache.org/docs/docker-stack/entrypoint
BACKEND=postgresql+psycopg2
DB_HOST=my-db-endpoint
DB_PORT=5432
Regardless of any airflow - specific issues, how can I check docker - wise what's going on?
Docker seems to be aware of a couple of containers not being healty.
edit: both failing containers have the healtcheck condition
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
and
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/"]
that seems to be failing by looking into their inspect output
Failed to connect to localhost port 8080: Connection refused
but I cannot pinpoint what is causing the failure.
edit: I have tried following the instructions to start the init service first as well
# docker-compose up airflow-init
Starting airflow_redis_1 ... done
Starting airflow_airflow-init_1 ... done
Attaching to airflow_airflow-init_1
airflow-init_1 | BACKEND=postgresql+psycopg2
airflow-init_1 | DB_HOST=my-db-endpoint
airflow-init_1 | DB_PORT=5432
but it never exits, it prints the above message and that's it...
I ran into similar issue and it was docker volume causing the issue. As I was running lots of containers on my mac, there wasn;t enough disk space. I managed to fixed this issue my pruning the docker volume.
docker volume prune
This will remove any unused volume on your mac book. Before running this command please check if you got any useful data.
For docker-compose, from the entrypoint, the default value of group id is 0.
"${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
Edit your docker-compose.yaml file or ad ass env.sh file in your Airflow project repository.
It does seems to be an error due to less memory allocated to docker for running this image, try to increase the resources available to docker and see the magic
I ran into similar issue and the healthchecks were "causing" this. I was running them using default container's user.
Just to give a try, I changed the healthckeck command to start using airflow user instead, as follow:
$ runuser -u airflow -- <healthckeck command>
And it solved. I'm gonna change the user whom runs docker compose up to airflow from now on.
I have a customized rabbitmq image that I am using with docker-compose (3.7) to launch a docker cluster. This is necessary because of some peculiar issues when trying to deploy a cluster in docker swarm. The image has a shell script which runs on the primary and secondary nodes and makes the modifications needed to run a cluster. This involves stopping rabbitmq and running rabbitmqctl commands to create the cluster between the two nodes. This configuiration works flawlessly until I try to add in a healthcheck. I have tried adding it in to the image and adding it into the compose file. Both cause the image to crash and constantly restart. I have the following shell script which gets copied into the image:
#!/bin/bash
set -eo pipefail
# A RabbitMQ node is considered healthy if all the below are true:
# * the rabbit app finished booting & it's running
# * there are no alarms
# * there is at least 1 active listener
rabbitmqctl eval '
{ true, rabbit_app_booted_and_running } = { rabbit:is_booted(node()), rabbit_app_booted_and_running },
{ [], no_alarms } = { rabbit:alarms(), no_alarms },
[] /= rabbit_networking:active_listeners(),
rabbitmq_node_is_healthy.
' || exit 1
On an already running image this works and produces the correct result.
I tried the flowing in the compose file:
healthcheck:
interval: 60s
timeout: 60s
retries: 10
start_period: 600s
test: ["CMD", "docker-healthcheck"]
It seems that the start_period is completely ignored. I can see the health status with an error right away. I have also tried the following native rabbitmq diagnostics command:
rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q check_local_alarms
This oddly fails with an "unable to find rabbitmq-diagnostics" error, despite the fact the program is definitely in the path. I can execute the command successfully in an already running container.
If I create the container without the healthcheck and then add it in after the fact from the command line with:
docker service update --health-cmd docker-healthcheck --health-interval 60s --health-timeout 60s --health-retries 10 [container id]
it marks the container healthy. So it works but just not in a start up configuration. It seems like to me that the healthcheck should not begin until 10 minutes have passed. It doesn't seem to matter how long I wait for everything to startup using the start_period parameter it still causes the container to fail.
Is this a bug or is there something mysterious about the way start_period works?
Anyone else every have this problem?
I would like to scale a wildfly container having exposed multiple ports with deterministic results.
docker-compose.yml
version: '3'
services:
wildfly-server:
build:
context: .
dockerfile: Dockerfile
args:
admin_user: admin
admin_password: admin
deploy:
resources:
limits:
memory: 1.5G
cpus: "1.5"
restart: always
ports:
- "8000-8099:8080"
- "8100-8199:9990"
- "8200-8299:8787"
expose:
- "8080"
- "9990"
- "8787"
Dockerfile
FROM jboss/wildfly:16.0.0.Final
# DOCKER ENV VARIABLES
ENV WILDFLY_HOME /opt/jboss/wildfly
ENV STANDALONE_DIR ${WILDFLY_HOME}/standalone
ENV DEPLOYMENT_DIR ${STANDALONE_DIR}/deployments
ENV CONFIGURATION_DIR ${STANDALONE_DIR}/configuration
RUN ${WILDFLY_HOME}/bin/add-user.sh ${admin_user} ${admin_password} --silent
# OPENING DEBUG PORT
RUN rm ${WILDFLY_HOME}/bin/standalone.conf
ADD standalone.conf ${WILDFLY_HOME}/bin/
# SET JAVA ENV VARS
RUN rm ${CONFIGURATION_DIR}/standalone.xml
ADD standalone.xml ${CONFIGURATION_DIR}/
Command to start
docker-compose up --build --force-recreate --scale wildfly-server=10
It almost works as I want to, but there is some port discrepancy. When I create the containers, I want them to have incremental ports for each container to be exposed as follows:
machine_1 8001, 8101, 82001
machine_2 8002, 8102, 82002
machine_3 8003, 8103, 82003
But what I get as a result is not deterministic and looks like this:
machine_1 8001, 8102, 82003
machine_2 8002, 8101, 82001
machine_3 8003, 8103, 82002
The problem is that every time I run the compose up command, the ports are different for each container.
Example output:
CONTAINER ID COMMAND CREATED STATUS PORTS NAMES
0232f24fbca4 "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8028->8080/tcp, 0.0.0.0:8231->8787/tcp, 0.0.0.0:8126->9990/tcp wildfly-server_7
13a6a365a552 "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8031->8080/tcp, 0.0.0.0:8230->8787/tcp, 0.0.0.0:8131->9990/tcp wildfly-server_10
bf8260d9874d "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8029->8080/tcp, 0.0.0.0:8228->8787/tcp, 0.0.0.0:8129->9990/tcp wildfly-server_6
3d58f2e9bdfe "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8030->8080/tcp, 0.0.0.0:8229->8787/tcp, 0.0.0.0:8130->9990/tcp wildfly-server_9
7824a73a09f5 "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8027->8080/tcp, 0.0.0.0:8227->8787/tcp, 0.0.0.0:8128->9990/tcp wildfly-server_3
85425462259d "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8024->8080/tcp, 0.0.0.0:8224->8787/tcp, 0.0.0.0:8124->9990/tcp wildfly-server_2
5be5bbe8e577 "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8026->8080/tcp, 0.0.0.0:8226->8787/tcp, 0.0.0.0:8127->9990/tcp wildfly-server_8
2512fc0643a3 "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8023->8080/tcp, 0.0.0.0:8223->8787/tcp, 0.0.0.0:8123->9990/tcp wildfly-server_5
b156de688dcb "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8025->8080/tcp, 0.0.0.0:8225->8787/tcp, 0.0.0.0:8125->9990/tcp wildfly-server_4
3e9401552b0a "/opt/jboss/wildfly/…" 5 minutes ago Up 5 minutes 0.0.0.0:8022->8080/tcp, 0.0.0.0:8222->8787/tcp, 0.0.0.0:8122->9990/tcp wildfly-server_1
Question
Is there any way to make the port distribution deterministic? Like disable parallel running to have serial checks on the available ports or any other method? The only alternative I found is to have a yml template and generate all the necessary files (like 10 if I need 10 containers etc). Are there any alternative solutions?
No, you cannot currently (10/14/19) make the port selection deterministic in the docker-compose file. This behavior was requested in Github issues #722 and #1247, but those issues were closed without the issue having been implemented.
If you want to semi-dynamically scale an application like it sounds like you do, then you'll need to solve this another way. Your .yml templating idea sounds like the cleanest solution IMO.
Are you sure you need the ports to be deterministic? If you use a reverse proxy like nginx that listens on one host port and balances the load between all of your docker containers, would that work for your use case? Setting up an nginx load balancer in a docker container is pretty straightforward. I suggest you look into that, and if you still need a deterministic way for a caller to know the service's port so it can send a request to a specific server repeatedly, then go with your .yml templating solution or some kind of service discovery process separate from the docker-compose configuration.
You could do this using variable substitution:
yaml:
...
ports:
${PORT}:8080
Then call docker with the specific ports:
for p in {8000..8099}; do
PORT=$p docker-compose ...
done