Docker swarm: guarantee high availability after restart - docker

I have an issue using Docker swarm.
I have 3 replicas of a Python web service running on Gunicorn.
The issue is that when I restart the swarm service after a software update, an old running service is killed, then a new one is created and started. But in the short period of time when the old service is already killed, and the new one didn't fully start yet, network messages are already routed to the new instance that isn't ready yet, resulting in 502 bad gateway errors (I proxy to the service from nginx).
I use --update-parallelism 1 --update-delay 10s options, but this doesn't eliminate the issue, only slightly reduces chances of getting the 502 error (because there are always at least 2 services running, even if one of them might be still starting up).

So, following what I've proposed in comments:
Use the HEALTHCHECK feature of Dockerfile: Docs. Something like:
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/ || exit 1
Knowing that Docker Swarm does honor this healthcheck during service updates, it's relative easy to have a zero downtime deployment.
But as you mentioned, you have a high-resource consumer health-check, and you need larger healthcheck-intervals.
In that case, I recomend you to customize your healthcheck doing the first run immediately and the successive checks at current_minute % 5 == 0, but the healthcheck itself running /30s:
HEALTHCHECK --interval=30s --timeout=3s \
CMD /service_healthcheck.sh
healthcheck.sh
#!/bin/bash
CURRENT_MINUTE=$(date +%M)
INTERVAL_MINUTE=5
[ $((a%2)) -eq 0 ]
do_healthcheck() {
curl -f http://localhost/ || exit 1
}
if [ ! -f /tmp/healthcheck.first.run ]; then
do_healhcheck
touch /tmp/healthcheck.first.run
exit 0
fi
# Run only each minute that is multiple of $INTERVAL_MINUTE
[ $(($CURRENT_MINUTE%$INTERVAL_MINUTE)) -eq 0 ] && do_healhcheck
exit 0
Remember to COPY the healthcheck.sh to /healthcheck.sh (and chmod +x)

There are some known issues (e.g. moby/moby #30321) with rolling upgrades in docker swarm with the current 17.05 and earlier releases (and doesn't look like all the fixes will make 17.06). These issues will result in connection errors during a rolling upgrade like you're seeing.
If you have a true zero downtime deployment requirement and can't solve this with a client side retry, then I'd recommend putting in some kind of blue/green switch in front of your swarm and do the rolling upgrade to the non-active set of containers until docker finds solutions to all of the scenarios.

Related

Cannot get docker healthcheck to work with ECS Fargate v 1.4.0

I have a health check defined for my ECS Fargate Service, it works when I test locally and works with Fargate v 1.3.0.
But when I change to Fargate Platform version 1.4.0 it always turns unhealthy. But the actual service is working. I can access the service on the containers public IP.
The health check is defined as:
"CMD-SHELL", "curl --fail http://localhost || exit 1"
So we looked into this and there's an issue in platform version 1.4 where, if the health check outputs anything to stderr a false negative occurs. We will, obviously, fix this but in the meantime you can work around this by (in this case) run curl in silent mode or simply redirect stderr output to /dev/null:
curl -s --fail http://localhost || exit 1
or
curl --fail http://localhost 2>/dev/null || exit 1
Should unblock you for now.
I wanted to collate some answers together and build on them, as follows.
I'm not being funny, but first and foremost make sure you have a healthcheck endpoint running somewhere. Note that this doesn't have to be inside your container! Let me show you what I mean:
curl -s --fail -I https://127.0.0.1:8000/ || exit 1
will only pass if you have a HTTP server running on localhost port 8000 (etc.). This can be anything that returns a 200 - over to you.
Tips:
Make sure curl is installed inside the container
-s is for silent
--fail - ask google
-I header only
If localhost doesn't work try 127.0.0.1
Now, in my case I was not running a HTTP server but rather a long-running python script. In its error state the script exits with 1 (which terminates the task), but otherwise (after a long time) it exits with 0. To fail the healthcheck, the healthcheck call must also return 0 (otherwise there is a 1 and the task is again terminated*). [*exit codes > 1 can be converted to a 1 - see below stolen trick.]
So I had to fake a different endpoint with the same behaviour.
Step forward, Google.
curl -s --fail -I https://www.google.com || exit 1
As before, but now hit an external endpoint kindly provided. Note the || exit 1 which converts any positive-definite integer exit code to the 1 liked by the healthcheck.
Sorry to "state the bleeding obvious", but you really do need a function running here - don't run curl on a local endpoint and expect to get a healthy status!
Remember to expose the https / http ports 443 / 80 in your docker file and in the JSON task definition spec/through the console UI.
TIP! Note that the CMD-SHELL syntax is slightly different depending.
Putting it all together, for ECS Fargate the rest is correct.
You could also try an echo rather than a curl. I am unclear whether a point-to-point call is even required.

Issue accessing vespa outside docker container

Installed Docker on Mac and trying to run Vespa on Docker following steps specified in following link
https://docs.vespa.ai/documentation/vespa-quick-start.html
I did n't had any issues till step 4. I see vespa container running after step 2 and step 3 returned 200 OK response.
But Step 5 failed to return 200 OK response. Below is the command I ran on my terminal
curl -s --head http://localhost:8080/ApplicationStatus
I keep getting
curl: (52) Empty reply from server whenever I run without -s option.
So I tried to see listening ports inside my vespa container and don't see anything for 8080 but can see for 19071(used in step 3)
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 8080'
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 19071'
tcp 0 0 0.0.0.0:19071 0.0.0.0:* LISTEN
Below doc has info related to vespa ports
https://docs.vespa.ai/documentation/reference/files-processes-and-ports.html
I'm assuming port 8080 should be active after docker run(step 2 of quick start link) and can be accessed outside container as port mapping is done.
But I don't see 8080 port active inside container in first place.
A'm I missing something. Do I need to perform any additional step than mentioned in quick start? FYI I installed Jenkins inside my docker and was able to access outside container via port mapping. But not sure why it's not working with vespa.I have been trying from quiet sometime but no progress. Please advice me if I'm missing something here.
You have too low memory for your docker container, "Minimum 6GB memory dedicated to Docker (the default is 2GB on Macs).". See https://docs.vespa.ai/documentation/vespa-quick-start.html
The deadlock detector warnings and failure to get configuration from configuration server (which is likely oom killed) indicates that you are too low on memory.
My guess is that your jdisc container had not finished initialize or did not initialize properly? Did you try to check the log?
docker exec vespa bash -c '/opt/vespa/bin/vespa-logfmt /opt/vespa/logs/vespa/vespa.log'
This should tell you if there was something wrong. When it is ready to receive requests you would see something like this:
[2018-12-10 06:30:37.854] INFO : container Container.org.eclipse.jetty.server.AbstractConnector Started SearchServer#79afa369{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
[2018-12-10 06:30:37.857] INFO : container Container.org.eclipse.jetty.server.Server Started #10280ms
[2018-12-10 06:30:37.857] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components. Application switch number: 0
[2018-12-10 06:30:37.859] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Initializing new set of configurations and components. Application switch number: 1

What does the "(healthy)" string in STATUS stands for?

What does the "(healthy)" string in STATUS column stands for?
user#user:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
X X X X Up 20 hours X X
X X X X Up 21 hours (healthy) X X
That's the result of the HEALTHCHECK instruction. That instruciton runs a command inside the container every 30 seconds. If the command succeeds, the container is marked healthy. If it fails too many times, it's marked unhealthy.
You can set the interval, timeout, number of retries and start delay.
The following, for example, will check that your container responds to HTTP every 5 minutes with a timeout of 3 seconds.
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/ || exit 1
You get a health_status event when the health status changes. You can follow those and others with docker events.
https://ryaneschinger.com/blog/using-docker-native-health-checks/
Normally it's something you launch with, to enable swarm or other services to check on the health of the container.
IE:
$ docker run --rm -it \
--name=elasticsearch \
--health-cmd="curl --silent --fail localhost:9200/_cluster/health || exit 1" \
--health-interval=5s \
--health-retries=12 \
--health-timeout=2s \
elasticsearch
see the health checks enabled at runtime?
Means they are using the command: healthcheck
https://docs.docker.com/engine/reference/builder/#healthcheck
When a container has a healthcheck specified, it has a health status in addition to its normal status. This status is initially starting. Whenever a health check passes, it becomes healthy (whatever state it was previously in). After a certain number of consecutive failures, it becomes unhealthy.
**starting** – Initial status when the container is still starting
**healthy** – If the command succeeds then the container is healthy
**unhealthy** – If a single run of the takes longer than the specified
timeout then it is considered unhealthy. If a health check fails then the
will run retries number of times and will be declared unhealthy
if the still fails.
Reference

How to know if my program is completely started inside my docker with compose

In my CI chain I execute end-to-end tests after a "docker-compose up". Unfortunately my tests often fail because even if the containers are properly started, the programs contained in my containers are not.
Is there an elegant way to verify that my setup is completely started before running my tests ?
You could poll the required services to confirm they are responding before running the tests.
curl has inbuilt retry logic or it's fairly trivial to build retry logic around some other type of service test.
#!/bin/bash
await(){
local url=${1}
local seconds=${2:-30}
curl --max-time 5 --retry 60 --retry-delay 1 \
--retry-max-time ${seconds} "${url}" \
|| exit 1
}
docker-compose up -d
await http://container_ms1:3000
await http://container_ms2:3000
run-ze-tests
The alternate to polling is an event based system.
If all your services push notifications to an external service, scaeda gave the example of a log file or you could use something like Amazon SNS. Your services emit a "started" event. Then you can subscribe to those events and run whatever you need once everything has started.
Docker 1.12 did add the HEALTHCHECK build command. Maybe this is available via Docker Events?
If you have control over the docker engine in your CI setup you could execute docker logs [Container_Name] and read out the last line which could be emitted by your application.
RESULT=$(docker logs [Container_Name] 2>&1 | grep [Search_String])
logs output example:
Agent pid 13
Enter passphrase (empty for no passphrase): Enter same passphrase again: Identity added: id_rsa (id_rsa)
#host SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
#host SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
parse specific line:
RESULT=$(docker logs ssh_jenkins_test 2>&1 | grep Enter)
result:
Enter passphrase (empty for no passphrase): Enter same passphrase again: Identity added: id_rsa (id_rsa)

Distributed RabbitMQ Nodes don't recognize each other

I'm working on a RabbitMQ distributed POC and I'm stuck at the basics of clustering the nodes.
I'm trying to follow the rabbit's tutorial on clustering so this is my reference.
After installing erlang (R14B04) and rabbit (2.8.2-1) I've copied the .erlang.cookie file contents from one node to the other two.
I wasn't sure about how to get erlang to notice this change to I had to restart the machines themselves (pretty brute force but I don't know erlang at all).
In addtion I opened in iptables 4369 and 5 additional ports for communications and placed under
/usr/lib64/erlang/bin/sys.config the following config:
{kernel,[{inet_dist_listen_min, XX00},{inet_dist_listen_max,XX05}]}]
Then another restart (dumb I know) to verify erlang takes these into consideration but still when I run:
rabbitmqctl cluster rabbit#HostName1
I get:
Clustering node rabbit#HostName2 with [rabbit#HostName1] ...
Error: {no_running_cluster_nodes,[rabbit#HostName1],
[rabbit#HostName1]}
There is a chance my fiddling with the erlang.cookie or with the ports did not succeed but I don't know how to check them. I tried typing erl in the cmd and then erl_epmd:names() or other commands to get more information but I'm probably way off in erlang land.
Would truly appreciate any help
Update:
I tried pinging two erlang nodes manually and got pang back.
I did the following:
Connected to two nodes, stopped rabbitmq (wasn't sure if needed but to be sure), started erlang like so (erl -sname dilbert and erl -sname dilbert2) when the erlang command line started i ran node(). on each of them and got dilbert#HostName1 and dilbert2#HostName2 respectively. I then tried to run net_adm:ping('dilbert'). and net_adm:ping('dilbert#HostName1'). with the single quote and without them from both nodes (changed names of course) and got on all 8 cases pang.
When I ran nodes(). on one of the machines I got back an empty array.
I've also tried to allow all traffic in the firewall (script) and then try to run the above commands (don't worry they're back on now) and still got back pang.
Update2:
For some reason I had cookies mismatch which I needed to resolve (thanks #kjw0188 for the suggestion [I ran erlang:get_cookie(). in the erlang command line]).
This did not help and I needed to stop iptables completely (not sure why but I'll figure it soon) and load the erlang node with -name dilbert#my-ip because my rackspace servers have no dns-name. This finally enabled me to get a pong and see the nodes see each other (nodes(). returns a non-empty array after the ping).
The problem I'm facing now is how to instruct RabbitMQ to use -name instead of -sname when starting erlang.
So I had multiple issues with connecting my two RabbitMQ nodes-
I'll add that my nodes are hosted on rackspace, and so don't have a default exposable hostname, and require iptables since there is no DMZ or built in security group concept like amazon.
Problems:
1. Cookie- Not sure how or why but I had multiple instances of .erlang.cookie (in /root, in my home directory and in /var/lib/rabbitmq/) I kept only the one in rabbitmq and verified all nodes have the same cookie.
2. IPTables- In order for the nodes to communicate I needed to open the epmd port and the range of ports for the actual communication inet_dist_listen_min inet_dist_listen_max.
/sbin/iptables -A INPUT -i eth1 -p tcp --dport ${epmd} -s ${otherNode} -j ACCEPT
/sbin/iptables -A INPUT -i eth1 -p tcp --dport ${inet_dist_listen_min}:${inet_dist_listen_max} -s ${otherNode} -j ACCEPT
empd is the usuall 4369 port and for the other range use whatever range you want.
${otherNode} is the ip of my other node.
I also needed to configure erlang through rabbitmq to use these ports (see config file at end)
3. HostName- Seeing as I don't have a hostname I needed to edit the rabbit scripts to use -name and not -sname (the first tells erlang to take the whole name, the latter stands for short name and thus appends an # symbol and the hostname).
This was accomplished by editing:
/usr/lib/rabbitmq/bin/rabbitmqctl
Added at the beginning the definition of the RABBITMQ_NODE_IP_ADDRESS property
DEFAULT_NODE_IP_ADDRESS=auto
DEFAULT_NODE_PORT=5672
[ "x" = "x$RABBITMQ_NODE_IP_ADDRESS" ] && RABBITMQ_NODE_IP_ADDRESS=${NODE_IP_ADDRESS}
[ "x" = "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_PORT=${NODE_PORT}
[ "x" = "x$RABBITMQ_NODE_IP_ADDRESS" ] && [ "x" != "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_IP_ADDRESS=${DEFAULT_NODE_IP_ADDRESS}
[ "x" != "x$RABBITMQ_NODE_IP_ADDRESS" ] && [ "x" = "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_PORT=${DEFAULT_NODE_PORT}
and in the actual erl command I changed
-sname ${RABBITMQ_NODENAME} \ to
-name ${RABBITMQ_NODENAME}#${RABBITMQ_NODE_IP_ADDRESS}\.
This made rabbitmq listen only on the specified ip address (specified in the config file at the end) and load with that ip instead of the usuall hostname.
edited /usr/lib/rabbitmq/bin/rabbitmq-server
Changed the actual erl command from -sname ${RABBITMQ_NODENAME} \ to -name ${RABBITMQ_NODENAME}#${RABBITMQ_NODE_IP_ADDRESS}\
Added a rabbit conf (/etc/rabbitmq/rabbitmq-env.conf) file with-
#the ip address which rabbit should use, this is to limit rabbit to only use internal rackspace communication and not publicly accessible ports
NODE_IP_ADDRESS=myIpAdress
#had to change the nodename becaue otherwise rabbitmq used rabbit#Hostname and not only rabbit
NODENAME=myCompany
#This instructed rabbit to instruct erlang which ports it should use for its communications with other nodes
export SERVER_ERL_ARGS="$SERVER_ERL_ARGS -kernel inet_dist_listen_min somePort -kernel inet_dist_listen_max someOtherBiggerPort"
Some resources which helped me along the way:
RabbitMQ Clustering Guide
Clustering RabbitMQ servers for High Availability
rabbitmq-env.conf(5) manual page
Node communication by public IP address erlang mailing list (The middle post)
Configuring RabbitMQ Cluster on Cloud
Hope this will help anyone else.
EDIT:
Not sure how I was mistaken but it seemed my erlang-rabbit port instructions were not taken into consideration or were not enough. Ended up having to allow all communications between the two nodes...
One thing to really watch out for is whitespace of any kind in the erlang cookie file, especially line breaks AFTER the contents of the cookie. So long as both are identical, things are okay, but when one has a line break and the other doesn't, thing won't work.
Background: I was facing the same issue while setting up Rabbitmq cluster. I was using 2 docker containers running on my host-machine, which is equivalent to 2 separate nodes and I could not create a cluster of these two.
Solution: 1. Make sure you have same erlang cookie on all your cluster nodes, the default location is /var/lib/rabbitmq/.erlang.cookie. This file is used for authentication, so make sure, you have it same on all the nodes. After changing the .erlang.cookie restart your rabbitmq service.
Make sure that nodes are accessible from one other, use ping or telnet to check the connection.
Check that /etc/hosts have correct entries, for example if rabbit2 wants to join cluster rabbit1, /etc/hosts of rabbit2 should contain.
172.68.1.6 rabbit1
172.68.1.7 rabbit2
Now stop service using $rabbitmqctl stop_app followed by $rabbitmqctl join_cluster rabbit#rabbit1, start your service by rabbitmqctl start_app and check $rabbitmqctl cluster_status to see weather you have joined the cluster or not.
I followed the rabbitmq official documentation to setup the cluster.
to change RabbitMQ sname/name behaviour you can edit the scripts:
rabbitmq-multi
rabbitmq-server
rabbitmqctl
Example
In script rabbitmqctl there is the following piece of code:
exec erl \
-pa "${RABBITMQ_HOME}/ebin" \
-noinput \
-hidden \
${RABBITMQ_CTL_ERL_ARGS} \
-sname rabbitmqctl$$ \
-s rabbit_control \
-nodename $RABBITMQ_NODENAME \
-extra "$#"
You have to change it in:
exec erl \
-pa "${RABBITMQ_HOME}/ebin" \
-noinput \
-hidden \
${RABBITMQ_CTL_ERL_ARGS} \
-name rabbitmqctl$$ \
-s rabbit_control \
-nodename $RABBITMQ_NODENAME \
-extra "$#"
http://pearlin.info/?p=1672
so you need to copy the cookie from the node you trying to connect
example :- rabbit#node1
rabbit#node2
go to rabbit#node1 and copy the cookie from cat /var/lib/rabbitmq/.erlang.cookie
go to rabbit#node2 remove the current cookie and paste the new one.
on same node
/usr/sbin/rabbitmqctl stop_app
/usr/sbin/rabbitmqctl reset
/usr/sbin/rabbitmqctl cluster rabbit#node1
should do it.
same documented here.
http://pearlin.info/?p=1672

Resources