The healthcheck is failing when deploying a mssql database on AWS ECS.
Below is a copy of the service form the docker-compose.yml file
sql_server_db:
image: 'mcr.microsoft.com/mssql/server:2017-latest'
environment:
SA_PASSWORD: Password123#
ACCEPT_EULA: "Y"
labels:
- traefik.enable=false
deploy:
resources:
limits:
cpus: '1'
memory: 8Gb
reservations:
cpus: '0.5'
memory: 4GB
healthcheck:
test: ["/opt/mssql-tools/bin/sqlcmd", "-U", "sa", "-P", "Password123#", "-Q", "SELECT 1"]
interval: 1m
retries: 10
start_period: 60s
I have the same issue, when checking the "inspect" for the container I was getting "Login fails for SA"
this was disturbing because the password was the same (I used the .env variable) ... but for some reason the special characters seems to mess up the check.
I simply created a oneliner script
/opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P $SA_PASSWORD -Q "Select 1"
and then I called it as HC
healthcheck:
test: ["CMD","bash","/healthcheck.sh", ]
and it works
I don't really like it but I will keep it until I find a better one (I am not sure it can actually fails )
Related
i have been trying to fine tune the docker compose settings but i am not satisfied with the result and the docs are so unspecific for the healthcheck and update_config options.
The scenario are react apps which need to run build and start during entrypoint execution. The builds can not be done on Dockerfile because then i would need to tag redundant images for each environment (amongst other inconveniences)
Because of the build and run steps the container is deployed and after the healthcheck will give a positive from node server it takes about 30 secs.
Now in a rollig update zero downtime scenario what settings would i use? The thing is i dont need more then 1 replica. The ideal config option would be wait_rolling_update_delay or something that would provoke docker to replace containers never before this wait time. i am playing around with the healthcheck.start_period but i am not seeing a difference.
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == worker
labels:
- "APP=name"
- "traefik.http.services.name.loadbalancer.server.port=1338"
restart_policy:
condition: any
delay: 10s
max_attempts: 3
window: 60s
update_config:
parallelism: 1
delay: 10s
monitor: 10s
order: start-first
failure_action: rollback
healthcheck:
test: "curl --silent --fail http://localhost:1338/_health || exit 1"
interval: 10s
timeout: 120s
retries: 10
start_period: 120s
I was referring to example given in the elasticsearch documentation for starting elastic stack (elastic and kibana) on docker using docker compose. It gives example of docker compose version 2.2 file. So, I tried to convert it to docker compose version 3.8 file. Also, it creates three elastic nodes and has security enabled. I want to keep it minimal to start with. So I tried to turn off security and also reduce the number of elastic nodes to 2. This is how my current compose file looks like:
version: "3.8"
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-amd64
volumes:
- esdata01:/usr/share/elasticsearch/data
ports:
- 9200:9200
environment:
- node.name=es01
- cluster.name=docker-cluster
- cluster.initial_master_nodes=es01
- bootstrap.memory_lock=true
- xpack.security.enabled=false
deploy:
resources:
limits:
memory: 1g
ulimits:
memlock:
soft: -1
hard: -1
healthcheck:
# [
# "CMD-SHELL",
# # "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
# ]
# Changed to:
test: ["CMD-SHELL", "curl -f http://localhost:9200 || exit 1"]
interval: 10s
timeout: 10s
retries: 120
kibana:
depends_on:
- es01
image: docker.elastic.co/kibana/kibana:8.0.0-amd64
volumes:
- kibanadata:/usr/share/kibana/data
ports:
- 5601:5601
environment:
- SERVERNAME=kibana
- ELASTICSEARCH_HOSTS=https://localhost:9200
deploy:
resources:
limits:
memory: 1g
healthcheck:
test:
[
"CMD-SHELL",
"curl -s -I http://localhost:5601 | grep -q 'HTTP/1.1 302 Found'",
]
interval: 10s
timeout: 10s
retries: 120
volumes:
esdata01:
driver: local
kibanadata:
driver: local
Then, I tried to run it:
docker stack deploy -c docker-compose.nosec.noenv.yml elk
Creating network elk_default
Creating service elk_es01
Creating service elk_kibana
When I tried to check their status, it displayed following:
$ docker container list
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3dcd08134e38 docker.elastic.co/kibana/kibana:8.0.0-amd64 "/bin/tini -- /usr/l…" 3 minutes ago Up 3 minutes (health: starting) 5601/tcp elk_kibana.1.ng8aspz9krfnejfpsnqzl2sci
7b548a43c45c docker.elastic.co/elasticsearch/elasticsearch:8.0.0-amd64 "/bin/tini -- /usr/l…" 3 minutes ago Up 3 minutes (healthy) 9200/tcp, 9300/tcp elk_es01.1.d9a107j6wkz42shti3n6kpfmx
I noticed that kibana's status gets stuck at (health: starting). When I checked Kibana's logs with command docker service logs -f elk_kibana, it had following WARN and ERROR lines:
[WARN ][plugins.security.config] Generating a random key for xpack.security.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.security.encryptionKey in the kibana.yml or use the bin/kibana-encryption-keys command.
[WARN ][plugins.security.config] Session cookies will be transmitted over insecure connections. This is not recommended.
[WARN ][plugins.security.config] Generating a random key for xpack.security.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.security.encryptionKey in the kibana.yml or use the bin/kibana-encryption-keys command.
[WARN ][plugins.security.config] Session cookies will be transmitted over insecure connections. This is not recommended.
[WARN ][plugins.reporting.config] Generating a random key for xpack.reporting.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.reporting.encryptionKey in the kibana.yml or use the bin/kibana-encryption-keys command.
[WARN ][plugins.reporting.config] Found 'server.host: "0.0.0.0"' in Kibana configuration. Reporting is not able to use this as the Kibana server hostname. To enable PNG/PDF Reporting to work, 'xpack.reporting.kibanaServer.hostname: localhost' is automatically set in the configuration. You can prevent this message by adding 'xpack.reporting.kibanaServer.hostname: localhost' in kibana.yml.
[ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. connect ECONNREFUSED 127.0.0.1:9200
It seems that kibana is not able to connect with Elasticsearch, but why? Is it because of disabling of security and that we cannot have security disabled?
PS-1: Earlier, when I set elasticsearch host as follows in kibana's environment in the docker compose file:
ELASTICSEARCH_HOSTS=https://es01:9200 # that is 'es01' instead of `localhost`
it gave me following error:
[ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. getaddrinfo ENOTFOUND es01
So, after checking this question, I changed es01 to localhost as specified earlier (that is in complete docker compose file content before PS-1.)
PS-2: Replacing localhost with 192.168.0.104 gives following error
[ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. connect ECONNREFUSED 192.168.0.104:9200
[ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. write EPROTO 140274197346240:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:332:
Try this :
ELASTICSEARCH_HOSTS=http://es01:9200
I don't know why it can run in my PC, since Elasticsearch is supossed use SSL. But in your case using http working just fine.
docker-compose.yml
services:
{{ app }}{{ env_id }}-{{stage_name}}:
image: "{{ registry_url }}/{{ app }}-{{ stage_name }}:{{ tag }}"
ports:
- {{ port }}:3000
volumes:
- /var/log/{{ app }}/logs:/app/logs
networks:
- net{{ env_id }}
hostname: "{{contain_name}}"
logging:
driver: syslog
options:
tag: "{{ app }}"
stop_grace_period: 20s
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/version"]
interval: 5s
timeout: 10s
retries: 3
start_period: 5s
deploy:
replicas: 4
update_config:
parallelism: 1
order: start-first
failure_action: rollback
monitor: 15s
rollback_config:
order: start-first
restart_policy:
condition: any
delay: 5s
resources:
limits:
memory: 7G
networks:
net{{ env_id }}:
name: {{ app }}{{ env_id }}_network
use the docker-compose.yml,I can get a swarm stack and four contains, but contains have same hostname,I want they named like
"contain_name1
contain_name2
contain_name3
contain_name4"
How to do it?
Unfortunately, this functionality is not yet supported. https://github.com/docker/swarmkit/issues/1242
Kubernetes can resolve this problem by using StatefulSet. https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
Docker updates container but network registration takes 10 minutes to complete so while the new container is being registered the page returns 502 because internal network is still pointing at the old container. How can i delay the removal of the old container after the update to the new container by 10 minutes or so? Ideally I would like to push this config with docker stack but I'll do whatever it takes. I should also note that I am unable to use replicas right now due to certain limitations of a security package i'm being forced to use.
version: '3.7'
services:
xxx:
image: ${xxx}/com.xxx:${xxx}
environment:
- SERVICE_NAME=xxx
- xxx
- _xxx
- SPRING_PROFILES_ACTIVE=${xxx}
networks:
- xxx${xxx}
healthcheck:
interval: 1m
deploy:
mode: replicated
replicas: 1
resources:
limits:
cpus: '3'
memory: 1024M
reservations:
cpus: '0.50'
memory: 256M
labels:
- com.docker.lb.hosts=xxx${_xxx}.xxx.com
- jenkins.url=${xxx}
- com.docker.ucp.access.label=/${xxx}/xxx
- com.docker.lb.network=xxx${_xxx}
- com.docker.lb.port=8080
- com.docker.lb.service_cluster=${xxx}
- com.docker.lb.ssl_cert=xxx.cert
- com.docker.lb.ssl_key=xxx.key
- com.docker.lb.redirects=http://xxx${_xxx}.xxx.com/xxx,https://xxx${_xxx}.xxx.com/xxx
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 120s
update_config:
parallelism: 1
delay: 10s
order: start-first
failure_action: rollback
rollback_config:
parallelism: 0
order: stop-first
secrets:
- ${xxx}
networks:
xxx${_xxx}:
external: true
secrets:
${xxx}:
external: true
xxx.cert:
external: true
xxx.key:
external: true
Use proper healthcheck - see the reference here: https://docs.docker.com/compose/compose-file/#healthcheck
So:
You need to define proper test to know when your new container is fully up (that goes inside test instruction of your healthcheck).
Use start_period instruction to specify your 10 (or so) minute way - otherwise, Docker Swarm would just kill your new container and never let it start.
Basically, once you get healthcheck right, this should solve your issue.
I'm trying to add an healthcheck in a mariadb service but it then prevents the client connection to the server, showing the message:
ERROR 2005 (HY000): Unknown MySQL server host 'mysql' (0)
The connection can be established and everything runs fine without the following healthcheck:
healthcheck:
test: /usr/bin/learnintouch/db-health-check.sh
interval: 30s
timeout: 10s
retries: 3
... with the healthcheck behaving as expected:
root#428b1e6c7c1a:/usr/bin/learnintouch/www/folkuniversitet# /usr/bin/learnintouch/db-health-check.sh
+ /usr/bin/mysql/install/bin/mysql --protocol=tcp -h mysql -u root -pxxxxxx -e 'show databases;'
Warning: Using a password on the command line interface can be insecure.
+--------------------+
| Database |
+--------------------+
| db_engine |
| db_learnintouch |
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
The initial working service configuration is:
mysql:
image: localhost:5000/mariadb:10.1.24
environment:
- MYSQL_ROOT_PASSWORD=xxxxxx
networks:
- learnintouch
volumes:
- "~/dev/docker/projects/learnintouch/volumes/database/data:/usr/bin/mariadb/install/data"
- "~/dev/docker/projects/learnintouch/volumes/logs:/usr/bin/mariadb/install/logs"
deploy:
replicas: 1
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 10s
The failing service configuration is:
mysql:
image: localhost:5000/mariadb:10.1.24
environment:
- MYSQL_ROOT_PASSWORD=xxxxxx
networks:
- learnintouch
volumes:
- "~/dev/docker/projects/learnintouch/volumes/database/data:/usr/bin/mariadb/install/data"
- "~/dev/docker/projects/learnintouch/volumes/logs:/usr/bin/mariadb/install/logs"
deploy:
replicas: 1
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 10s
healthcheck:
test: /usr/bin/learnintouch/db-health-check.sh
interval: 30s
timeout: 10s
retries: 3
At server startup, the mariadb log is exactly the same, with or without the healthcheck, and ends with:
2017-12-29 10:27:48 139873749194560 [Note] /usr/bin/mariadb/install/bin/mysqld: ready for connections.
Version: '10.1.24-MariaDB' socket: '/usr/bin/mariadb/install/tmp/mariadb.sock' port: 3306 Source distribution
Here is the db-health-check.sh bash script:
#!/bin/bash -x
/usr/bin/mysql/install/bin/mysql --protocol=tcp -h mysql -u root -pxxxxxx -e "show databases;" || exit 1
I'm on Docker version:
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:54 2017
OS/Arch: linux/amd64
Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:54 2017
OS/Arch: linux/amd64
Experimental: false
UPDATE:
There were a few issues in fact, which together made the whole thing puzzling.
The first one was that the db-health-check.sh file was sitting in the client application container, and thus could not be found by the service database container, when this latter needed to perform its healthcheck.
The second one was that the db-health-check.sh file used a wrong path for its mysql client /usr/bin/mariadb/install/bin/mysql.
The third one was that the db-health-check.sh file used a wrong hostname in its connection, wrongly using mysql instead of localhost.
The fourth one was that the root user could not connect on the 1045 (28000): Access denied for user 'root'#'localhost' as it should not have any password as in /usr/bin/mariadb/install/bin/mysql --protocol=tcp -h localhost -u root -e "show databases;" || exit 1.
After correcting these points, the application can start and run fine even with the healthcheck in place.
The one thing that blurred the lines even further, was that the service is not available until the first healthcheck has passed successfully. Indeed I configured an heatlcheck as:
healthcheck:
test: exit 0
interval: 60s
timeout: 10s
retries: 3
with the container state showing this at first:
"Health": {
"Status": "starting",
"FailingStreak": 0,
"Log": []
}
and the application was stillnot able to connect to the service until after the first interval of 60s expired and the healthcheck being successful.
I should therefore have a short interval time so as to have the application not wait for too long to be available.